提交 607a1f48 编写于 作者: L liuqi

Refactor: move all files in kernels directory to ops and remove kernels.

1. Move all files in kernels to ops
2. unify the op_def_registry and op_registry.
3. Support op clip: do not link the ops' code when the op is not registered.
上级 d5679c53
......@@ -67,7 +67,6 @@ extra_tests:
stage: extra_tests
script:
- if [ -z "$TARGET_SOCS" ]; then TARGET_SOCS=random; fi
- python tools/bazel_adb_run.py --target="//mace/kernels:kernels_test" --run_target=True --stdout_processor=unittest_stdout_processor --target_abis=armeabi-v7a,arm64-v8a --target_socs=$TARGET_SOCS
- python tools/bazel_adb_run.py --target="//mace/utils:tuner_test" --run_target=True --stdout_processor=unittest_stdout_processor --target_abis=armeabi-v7a,arm64-v8a --target_socs=$TARGET_SOCS
platform_compatible_tests:
......
......@@ -94,7 +94,6 @@ jobs:
- python tools/bazel_adb_run.py --target="//mace/test:mace_api_test" --run_target=False --target_abis=armeabi-v7a || exit 1
- python tools/bazel_adb_run.py --target="//mace/test:mace_api_mt_test" --run_target=False --target_abis=armeabi-v7a || exit 1
- echo 'Extra Test'
- python tools/bazel_adb_run.py --target="//mace/kernels:kernels_test" --run_target=False --target_abis=armeabi-v7a || exit 1
- python tools/bazel_adb_run.py --target="//mace/utils:tuner_test" --run_target=False --target_abis=armeabi-v7a || exit 1
env: TYPE=Extra-Test-ARMEABI-v7a
os: linux
......@@ -106,7 +105,6 @@ jobs:
- python tools/bazel_adb_run.py --target="//mace/test:mace_api_test" --run_target=False --target_abis=arm64-v8a || exit 1
- python tools/bazel_adb_run.py --target="//mace/test:mace_api_mt_test" --run_target=False --target_abis=arm64-v8a || exit 1
- echo 'Extra Test on ARM64'
- python tools/bazel_adb_run.py --target="//mace/kernels:kernels_test" --run_target=False --target_abis=arm64-v8a || exit 1
- python tools/bazel_adb_run.py --target="//mace/utils:tuner_test" --run_target=False --target_abis=arm64-v8a || exit 1
env: TYPE=Extra-Test-ARM64-v8a
os: linux
......
......@@ -5,46 +5,24 @@ You can create a custom op if it is not supported yet.
To add a custom op, you need to follow these steps:
Register the new OpDef information
----------------------------------
Register the OpDef information about which devices the operation could run on.
Registry file is in `mace/ops/ops_def_register.cc`
```c++
#include "mace/ops/ops_def_register.h"
namespace mace {
namespace ops {
void RegisterOpDefs(OpDefRegistryBase *op_def_registry) {
MACE_REGISTER_OP_DEF(
op_def_registry,
OpRegistrationBuilder("MyCustomOp")
.SetDevicePlaceFunc([]() -> std::vector<DeviceType> {
return {DeviceType::CPU, DeviceType::GPU};
}));
......
}
} // namespace ops
} // namespace mace
```
Implement the Operation
-----------------------
The Best way is to refer to the implementation of other operator(e.g. `/mace/kernels/activation.cc`)
Define the new Op class in `mace/kernels/my_custom_op.cc`.
1. CPU code: just write the code in `mace/kernels/my_custom_op.cc`.
2. GPU code: Kernel API is defined in `mace/kernels/my_custom_op.h`,
Kernel based on Image is realized in `mace/kernels/opencl/image/my_custom_op.cc`,
Kernel based on Buffer is realized in `mace/kernels/opencl/buffer/my_custom_op.cc`.
The Best way is to refer to the implementation of other operator(e.g. `/mace/ops/activation.cc`)
Define the new Op class in `mace/ops/my_custom_op.cc`.
1. ARM kernels: Kernel about NEON is located at `mace/ops/arm/my_custom_op.cc`
2. GPU kernels: OpenCL kernel API is defined in `mace/ops/opencl/my_custom_op.h`,
* Kernel based on Image is realized in `mace/ops/opencl/image/my_custom_op.cc`,
* Kernel based on Buffer is realized in `mace/ops/opencl/buffer/my_custom_op.cc`.
* OpenCL kernel file is realized in `mace/ops/opencl/cl/my_custom_op.cl`.
* Add the path of opencl kernel file in file `mace/repository/opencl-kernel/opencl_kernel_configure.bzl`
The structure like the following code.
The structure of Op is like the following code.
```c++
#include "mace/core/operator.h"
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class MyCustomOp;
......@@ -56,43 +34,34 @@ class MyCustomOp<DeviceType::CPU, float> : public Operation {
#ifdef MACE_ENABLE_OPENCL
template <typename T>
class ActivationOp<DeviceType::GPU, T> : public Operation {
class MyCustomOp<DeviceType::GPU, T> : public Operation {
...
};
#endif // MACE_ENABLE_OPENCL
} // namespace ops
} // namespace mace
```
Register the Operation
-----------------------
1, Add register function in `mace/kernels/my_custom_op.cc`
```c++
#include "mace/core/operator.h"
namespace mace {
namespace kernels {
void RegisterMyCustomOp(OpRegistryBase *op_registry) {
MACE_REGISTER_OP(op_registry, "MyCustomOp", ActivationOp,
MACE_REGISTER_OP(op_registry, "MyCustomOp", MyCustomOp,
DeviceType::CPU, float);
#ifdef MACE_ENABLE_OPENCL
MACE_REGISTER_OP(op_registry, "MyCustomOp", ActivationOp,
MACE_REGISTER_OP(op_registry, "MyCustomOp", MyCustomOp,
DeviceType::GPU, float);
MACE_REGISTER_OP(op_registry, "MyCustomOp", ActivationOp,
MACE_REGISTER_OP(op_registry, "MyCustomOp", MyCustomOp,
DeviceType::GPU, half);
#endif // MACE_ENABLE_OPENCL
}
} // namespace ops
} // namespace mace
```
2, And then register the new Op in `mace/kernels/ops_register.cc`.
Register the Operation
-----------------------
Register the new Op in `mace/ops/ops_register.cc`.
```
#include "mace/kernels/ops_register.h"
#include "mace/ops/ops_register.h"
namespace mace {
namespace ops {
......
......@@ -112,7 +112,8 @@ There are two common advanced use cases:
- converting model to C++ code.
- tuning GPU kernels for a specific SoC.
* **Convert model(s) to C++ code**
Convert model(s) to C++ code
--------------------------------
* **1. Change the model deployment file(.yml)**
......@@ -204,7 +205,8 @@ There are two common advanced use cases:
// ... Same with the code in basic usage
* **Tuning for specific SoC's GPU**
Tuning for specific SoC's GPU
---------------------------------
If you want to use the GPU of a specific device, you can just specify the ``target_socs`` in your YAML file and
then tune the MACE lib for it (OpenCL kernels), which may get 1~10% performance improvement.
......@@ -375,25 +377,52 @@ Use ``-h`` to get detailed help.
Reduce Library Size
-------------------
* **dynamic library**
* Build for your own usage purpose.
* **dynamic library**
The generated dynamic library by script ``tools/build-standalone-lib.sh`` is about ``1.6M`` for
``armeabi-v7a`` and ``2.1M`` for ``arm64-v8a``. It can be reduced by modifying some build options.
- If the models don't need to run on device ``dsp``, change the build option ``--define hexagon=true``
to ``false``. And the library will be decreased about ``100KB``.
- If the models don't need to run on device ``dsp``, change the build option ``--define hexagon=true``
to ``false``. And the library will be decreased about ``100KB``.
- Futher more, if only ``cpu`` device needed, change ``--define opencl=true`` to ``false``. This way
will reduce half of library size to about ``700KB`` for ``armeabi-v7a`` and ``1000KB`` for ``arm64-v8a``
- Futher more, if only ``cpu`` device needed, change ``--define opencl=true`` to ``false``. This way
will reduce half of library size to about ``700KB`` for ``armeabi-v7a`` and ``1000KB`` for ``arm64-v8a``
- About ``300KB`` can be reduced when add ``--config symbol_hidden`` building option. It will change
the visibility of inner apis in libmace.so and lead to linking error when load model(s) in ``code``
but no effection for ``file`` mode.
- About ``300KB`` can be reduced when add ``--config symbol_hidden`` building option. It will change
the visibility of inner apis in libmace.so and lead to linking error when load model(s) in ``code``
but no effection for ``file`` mode.
* **static library**
* **static library**
- The methods in dynamic library can be useful for static library too. In additional, the static
library may also contain model graph and model datas if the configs ``model_graph_format`` and
``model_data_format`` in deployment file are set to ``code``.
- The methods in dynamic library can be useful for static library too. In additional, the static
library may also contain model graph and model datas if the configs ``model_graph_format`` and
``model_data_format`` in deployment file are set to ``code``.
- It is recommended to use ``version script`` and ``strip`` feature when linking mace static library. The effect is remarkable.
- It is recommended to use ``version script`` and ``strip`` feature when linking mace static library. The effect is remarkable.
* Remove the unused ops.
Remove the registration of the ops unused for your models in the ``mace/ops/ops_register.cc``,
which will reduce the library size significantly. the final binary just link the registered ops' code.
```
#include "mace/ops/ops_register.h"
namespace mace {
namespace ops {
// Just leave the ops used in your models
...
} // namespace ops
OpRegistry::OpRegistry() : OpRegistryBase() {
// Just leave the ops used in your models
...
ops::RegisterMyCustomOp(this);
...
}
} // namespace mace
```
......@@ -23,9 +23,10 @@
#include <cstring>
#include "mace/core/macros.h"
#include "mace/core/registry.h"
#include "mace/core/types.h"
#include "mace/core/runtime_failure_mock.h"
#include "mace/public/mace.h"
#include "mace/utils/logging.h"
namespace mace {
......
......@@ -27,8 +27,7 @@
namespace mace {
SerialNet::SerialNet(OpDefRegistryBase *op_def_registry,
const OpRegistryBase *op_registry,
SerialNet::SerialNet(const OpRegistryBase *op_registry,
const NetDef *net_def,
Workspace *ws,
Device *target_device,
......@@ -41,15 +40,7 @@ SerialNet::SerialNet(OpDefRegistryBase *op_def_registry,
target_device->cpu_runtime()->policy(),
target_device->cpu_runtime()->use_gemmlowp())) {
MACE_LATENCY_LOGGER(1, "Constructing SerialNet");
// Register Operations
MaceStatus status;
for (int idx = 0; idx < net_def->op_types_size(); ++idx) {
status = op_def_registry->Register(net_def->op_types(idx));
MACE_CHECK(status == MaceStatus::MACE_SUCCESS, status.information());
}
// Create Operations
operators_.clear();
const OpRegistrationInfo *info;
DeviceType target_device_type = target_device_->device_type();
OpConstructContext construct_context(ws_);
for (int idx = 0; idx < net_def->op_size(); ++idx) {
......@@ -59,16 +50,13 @@ SerialNet::SerialNet(OpDefRegistryBase *op_def_registry,
ProtoArgHelper::GetOptionalArg<OperatorDef, int>(
operator_def, "device", static_cast<int>(target_device_type));
if (op_device == target_device_type) {
// Find op registration information
status = op_def_registry->Find(operator_def.type(), &info);
MACE_CHECK(status == MaceStatus::MACE_SUCCESS, status.information());
// Get available devices (sorted based on priority)
OperatorDef temp_def(operator_def);
auto available_devices = info->device_place_func_();
auto available_devices = op_registry->AvailableDevices(temp_def.type());
// Find the device type to run the op.
// If the target_device_type in available devices, use target_device_type,
// otherwise, fallback to the first device (top priority).
DeviceType device_type = available_devices[0];
// otherwise, fallback to CPU device.
DeviceType device_type = DeviceType::CPU;
construct_context.set_device(cpu_device_);
for (auto device : available_devices) {
if (device == target_device_type) {
......
......@@ -21,8 +21,6 @@
#include <unordered_map>
#include <sstream>
#include "mace/core/op_def_registry.h"
#include "mace/core/operator.h"
namespace mace {
......@@ -45,8 +43,7 @@ class NetBase {
class SerialNet : public NetBase {
public:
SerialNet(OpDefRegistryBase *op_def_registry,
const OpRegistryBase *op_registry,
SerialNet(const OpRegistryBase *op_registry,
const NetDef *net_def,
Workspace *ws,
Device *target_device,
......
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/utils/logging.h"
namespace mace {
void AddOpRegistrar(OpDefRegistryBase *registry,
const OpRegistrationBuilder &builder) {
registry->AddRegistrar(
builder.name(),
[builder](OpRegistrationInfo *info){
builder.Finalize(info);
});
}
OpRegistrationBuilder::OpRegistrationBuilder(const std::string name)
: name_(name) {}
const std::string OpRegistrationBuilder::name() const { return name_; }
OpRegistrationBuilder &OpRegistrationBuilder::SetDevicePlaceFunc(
std::vector<DeviceType> (*func)()) {
info_.device_place_func_ = func;
return *this;
}
void OpRegistrationBuilder::Finalize(OpRegistrationInfo *info) const {
*info = info_;
}
void OpDefRegistryBase::AddRegistrar(const std::string name,
const OpRegistrar &registrar) {
registrar_.emplace(name, registrar);
}
MaceStatus OpDefRegistryBase::Register(const std::string &name) {
VLOG(3) << "Registering operation definition: " << name;
if (registry_.find(name) != registry_.end()) {
return MaceStatus::MACE_SUCCESS;
}
auto iter = registrar_.find(name);
if (iter == registrar_.end()) {
return MaceStatus(MaceStatus::MACE_INVALID_ARGS,
"MACE do not support the operation: " + name);
}
registry_.emplace(
name, std::unique_ptr<OpRegistrationInfo>(new OpRegistrationInfo()));
iter->second(registry_[name].get());
return MaceStatus::MACE_SUCCESS;
}
MaceStatus OpDefRegistryBase::Find(const std::string &name,
const OpRegistrationInfo **info) {
auto iter = registry_.find(name);
if (iter == registry_.end()) {
*info = nullptr;
return MaceStatus(MaceStatus::MACE_INVALID_ARGS,
"Mace do not support the operation: " + name);
}
*info = iter->second.get();
return MaceStatus::MACE_SUCCESS;
}
} // namespace mace
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_CORE_OP_DEF_REGISTRY_H_
#define MACE_CORE_OP_DEF_REGISTRY_H_
#include <functional>
#include <memory>
#include <string>
#include <unordered_map>
#include <vector>
#include "mace/proto/mace.pb.h"
#include "mace/public/mace.h"
#include "mace/utils/utils.h"
namespace mace {
// Device placement function
typedef std::function<std::vector<DeviceType>()> DevicePlaceFunc;
struct OpRegistrationInfo {
OpRegistrationInfo() = default;
explicit OpRegistrationInfo(const DevicePlaceFunc &func)
: device_place_func_(func) {}
DevicePlaceFunc device_place_func_;
};
class OpRegistrationBuilder {
public:
explicit OpRegistrationBuilder(const std::string name);
const std::string name() const;
OpRegistrationBuilder &SetDevicePlaceFunc(
std::vector<DeviceType> (*func)());
void Finalize(OpRegistrationInfo *info) const;
private:
std::string name_;
OpRegistrationInfo info_;
};
class OpDefRegistryBase {
public:
typedef std::function<void(OpRegistrationInfo *)> OpRegistrar;
OpDefRegistryBase() = default;
virtual ~OpDefRegistryBase() = default;
void AddRegistrar(const std::string name, const OpRegistrar &registrar);
MaceStatus Register(const std::string &name);
MaceStatus Find(const std::string &name, const OpRegistrationInfo **info);
private:
std::unordered_map<std::string, OpRegistrar> registrar_;
std::unordered_map<
std::string,
std::unique_ptr<OpRegistrationInfo>> registry_;
MACE_DISABLE_COPY_AND_ASSIGN(OpDefRegistryBase);
};
void AddOpRegistrar(OpDefRegistryBase *registry,
const OpRegistrationBuilder &builder);
#define MACE_REGISTER_OP_DEF(op_def_registry, builder) \
AddOpRegistrar(op_def_registry, builder)
} // namespace mace
#endif // MACE_CORE_OP_DEF_REGISTRY_H_
......@@ -13,6 +13,7 @@
// limitations under the License.
#include <sstream>
#include <map>
#include <memory>
#include <vector>
......@@ -79,7 +80,26 @@ MaceStatus Operation::Init(OpInitContext *context) {
return MaceStatus::MACE_SUCCESS;
}
OpKeyBuilder::OpKeyBuilder(const char *op_name) : op_name_(op_name) {}
// op registry
namespace {
class OpKeyBuilder {
public:
explicit OpKeyBuilder(const std::string &op_name);
OpKeyBuilder &Device(DeviceType device);
OpKeyBuilder &TypeConstraint(const char *attr_name,
DataType allowed);
const std::string Build();
private:
std::string op_name_;
DeviceType device_type_;
std::map<std::string, DataType> type_constraint_;
};
OpKeyBuilder::OpKeyBuilder(const std::string &op_name) : op_name_(op_name) {}
OpKeyBuilder &OpKeyBuilder::Device(DeviceType device) {
device_type_ = device;
......@@ -103,16 +123,53 @@ const std::string OpKeyBuilder::Build() {
return ss.str();
}
} // namespace
void OpRegistrationInfo::AddDevice(mace::DeviceType device) {
devices.insert(device);
}
void OpRegistrationInfo::Register(const std::string &key, OpCreator creator) {
VLOG(3) << "Registering: " << key;
MACE_CHECK(creators.count(key) == 0, "Key already registered: ", key);
creators[key] = creator;
}
MaceStatus OpRegistryBase::Register(const std::string &op_type,
const mace::DeviceType device_type,
const mace::DataType dt,
mace::OpRegistrationInfo::OpCreator creator) {
if (registry_.count(op_type) == 0) {
registry_[op_type] = std::unique_ptr<OpRegistrationInfo>(
new OpRegistrationInfo);
}
registry_[op_type]->AddDevice(device_type);
std::string op_key = OpKeyBuilder(op_type)
.Device(device_type)
.TypeConstraint("T", dt)
.Build();
registry_.at(op_type)->Register(op_key, creator);
return MaceStatus::MACE_SUCCESS;
}
const std::set<DeviceType> OpRegistryBase::AvailableDevices(
const std::string &op_type) const {
MACE_CHECK(registry_.count(op_type) != 0,
op_type, " operation is not registered.");
return registry_.at(op_type)->devices;
}
OpRegistryBase::~OpRegistryBase() = default;
std::unique_ptr<Operation> OpRegistryBase::CreateOperation(
OpConstructContext *context,
DeviceType device_type,
const NetMode mode) const {
OperatorDef *operator_def = context->operator_def();
const int dtype = ProtoArgHelper::GetOptionalArg<OperatorDef, int>(
*operator_def, "T", static_cast<int>(DT_FLOAT));
const DataType dtype = static_cast<DataType>(
ProtoArgHelper::GetOptionalArg<OperatorDef, int>(
*operator_def, "T", static_cast<int>(DT_FLOAT)));
const int op_mode_i = ProtoArgHelper::GetOptionalArg<OperatorDef, int>(
*operator_def, "mode", static_cast<int>(NetMode::NORMAL));
const NetMode op_mode = static_cast<NetMode>(op_mode_i);
......@@ -120,15 +177,20 @@ std::unique_ptr<Operation> OpRegistryBase::CreateOperation(
<< operator_def->type() << "<" << dtype << ">" << ") on "
<< device_type;
if (op_mode == mode) {
return registry_.Create(
OpKeyBuilder(operator_def->type().data())
.Device(device_type)
.TypeConstraint("T", static_cast<DataType>(dtype))
.Build(),
context);
const std::string op_type = context->operator_def()->type();
MACE_CHECK(registry_.count(op_type) != 0,
op_type, " operation is not registered.");
std::string key = OpKeyBuilder(op_type)
.Device(device_type)
.TypeConstraint("T", dtype)
.Build();
if (registry_.at(op_type)->creators.count(key) == 0) {
LOG(FATAL) << "Key not registered: " << key;
}
return registry_.at(op_type)->creators.at(key)(context);
} else {
return nullptr;
}
}
} // namespace mace
......@@ -16,13 +16,13 @@
#define MACE_CORE_OPERATOR_H_
#include <memory>
#include <set>
#include <string>
#include <unordered_map>
#include <vector>
#include <map>
#include "mace/core/arg_helper.h"
#include "mace/core/op_context.h"
#include "mace/core/registry.h"
#include "mace/core/tensor.h"
#include "mace/core/workspace.h"
#include "mace/proto/mace.pb.h"
......@@ -160,62 +160,57 @@ class Operation {
#define MACE_OP_OUTPUT_TAGS(first_input, ...) \
enum _OutputTags { first_input = 0, __VA_ARGS__ }
class OpKeyBuilder {
public:
explicit OpKeyBuilder(const char *op_name);
OpKeyBuilder &Device(DeviceType device);
struct OpRegistrationInfo {
public:
typedef std::function<std::unique_ptr<Operation>(OpConstructContext *)>
OpCreator;
OpKeyBuilder &TypeConstraint(const char *attr_name,
DataType allowed);
OpRegistrationInfo() = default;
template <typename T>
OpKeyBuilder &TypeConstraint(const char *attr_name);
void AddDevice(DeviceType);
const std::string Build();
void Register(const std::string &key, OpCreator creator);
private:
std::string op_name_;
DeviceType device_type_;
std::map<std::string, DataType> type_constraint_;
std::set<DeviceType> devices;
std::unordered_map<std::string, OpCreator> creators;
};
template <typename T>
OpKeyBuilder &OpKeyBuilder::TypeConstraint(const char *attr_name) {
return this->TypeConstraint(attr_name, DataTypeToEnum<T>::value);
}
class OpRegistryBase {
public:
typedef Registry<std::string,
Operation,
OpConstructContext *>
RegistryType;
OpRegistryBase() = default;
virtual ~OpRegistryBase();
RegistryType *registry() { return &registry_; }
virtual ~OpRegistryBase() = default;
MaceStatus Register(const std::string &op_type,
const DeviceType device_type,
const DataType dt,
OpRegistrationInfo::OpCreator creator);
const std::set<DeviceType> AvailableDevices(
const std::string &op_type) const;
std::unique_ptr<Operation> CreateOperation(
OpConstructContext *context,
DeviceType device_type,
const NetMode mode) const;
template <class DerivedType>
static std::unique_ptr<Operation> DefaultCreator(
OpConstructContext *context) {
return std::unique_ptr<Operation>(new DerivedType(context));
}
private:
RegistryType registry_;
std::unordered_map<
std::string,
std::unique_ptr<OpRegistrationInfo>> registry_;
MACE_DISABLE_COPY_AND_ASSIGN(OpRegistryBase);
};
MACE_DECLARE_REGISTRY(OpRegistry,
Operation,
OpConstructContext *);
#define MACE_REGISTER_OP(op_registry, op_type, class_name, device, dt) \
MACE_REGISTER_CLASS(OpRegistry, \
op_registry->registry(), \
OpKeyBuilder(op_type) \
.Device(device) \
.TypeConstraint<dt>("T") \
.Build(), \
class_name<device, dt>)
op_registry->Register(op_type, \
device, \
DataTypeToEnum<dt>::value, \
OpRegistryBase::DefaultCreator<class_name<device, dt>>)
} // namespace mace
......
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_CORE_REGISTRY_H_
#define MACE_CORE_REGISTRY_H_
#include <functional>
#include <map>
#include <memory>
#include <mutex> // NOLINT(build/c++11)
#include <string>
#include <vector>
#include "mace/utils/logging.h"
namespace mace {
template <class SrcType, class ObjectType, class... Args>
class Registry {
public:
typedef std::function<std::unique_ptr<ObjectType>(Args...)> Creator;
Registry() : registry_() {}
void Register(const SrcType &key, Creator creator) {
VLOG(3) << "Registering: " << key;
std::lock_guard<std::mutex> lock(register_mutex_);
MACE_CHECK(registry_.count(key) == 0, "Key already registered: ", key);
registry_[key] = creator;
}
std::unique_ptr<ObjectType> Create(const SrcType &key, Args... args) const {
if (registry_.count(key) == 0) {
LOG(FATAL) << "Key not registered: " << key;
}
return registry_.at(key)(args...);
}
private:
std::map<SrcType, Creator> registry_;
std::mutex register_mutex_;
MACE_DISABLE_COPY_AND_ASSIGN(Registry);
};
template <class SrcType, class ObjectType, class... Args>
class Registerer {
public:
Registerer(const SrcType &key,
Registry<SrcType, ObjectType, Args...> *registry,
typename Registry<SrcType, ObjectType, Args...>::Creator creator) {
registry->Register(key, creator);
}
template <class DerivedType>
static std::unique_ptr<ObjectType> DefaultCreator(Args... args) {
return std::unique_ptr<ObjectType>(new DerivedType(args...));
}
};
#define MACE_CONCATENATE_IMPL(s1, s2) s1##s2
#define MACE_CONCATENATE(s1, s2) MACE_CONCATENATE_IMPL(s1, s2)
#ifdef __COUNTER__
#define MACE_ANONYMOUS_VARIABLE(str) MACE_CONCATENATE(str, __COUNTER__)
#else
#define MACE_ANONYMOUS_VARIABLE(str) MACE_CONCATENATE(str, __LINE__)
#endif
#define MACE_DECLARE_TYPED_REGISTRY(RegistryName, SrcType, ObjectType, ...) \
typedef Registerer<SrcType, ObjectType, ##__VA_ARGS__> \
Registerer##RegistryName;
#define MACE_DECLARE_REGISTRY(RegistryName, ObjectType, ...) \
MACE_DECLARE_TYPED_REGISTRY(RegistryName, std::string, ObjectType, \
##__VA_ARGS__)
#define MACE_REGISTER_TYPED_CLASS(RegistryName, registry, key, ...) \
Registerer##RegistryName MACE_ANONYMOUS_VARIABLE(RegistryName)( \
key, registry, Registerer##RegistryName::DefaultCreator<__VA_ARGS__>);
#define MACE_REGISTER_CLASS(RegistryName, registry, key, ...) \
MACE_REGISTER_TYPED_CLASS(RegistryName, registry, key, __VA_ARGS__)
} // namespace mace
#endif // MACE_CORE_REGISTRY_H_
......@@ -15,10 +15,11 @@
#ifndef MACE_CORE_TENSOR_H_
#define MACE_CORE_TENSOR_H_
#include <algorithm>
#include <functional>
#include <numeric>
#include <string>
#include <vector>
#include <functional>
#include <algorithm>
#include "mace/core/buffer.h"
#include "mace/core/preallocated_pooled_allocator.h"
......
# Description:
# Mace neon kernels.
#
package(
default_visibility = ["//visibility:public"],
)
licenses(["notice"]) # Apache 2.0
load(
"//mace:mace.bzl",
"if_android",
"if_neon_enabled",
"if_openmp_enabled",
"if_android_armv7",
"if_hexagon_enabled",
"if_opencl_enabled",
)
cc_library(
name = "kernels",
srcs = glob(
[
"*.cc",
"arm/*.cc",
],
exclude = [
"*_test.cc",
"*_benchmark.cc",
"arm/*_test.cc",
"buffer_inverse_transform.cc",
"buffer_transform.cc",
"lstm_cell.cc",
"winograd_transform.cc",
],
) + if_opencl_enabled(glob(
[
"opencl/*.cc",
"opencl/image/*.cc",
"opencl/buffer/*.cc",
"buffer_inverse_transform.cc",
"buffer_transform.cc",
"lstm_cell.cc",
"winograd_transform.cc",
],
exclude = [
"opencl/*_test.cc",
],
)),
hdrs = glob(
[
"*.h",
"arm/*.h",
],
) + if_opencl_enabled(glob([
"opencl/*.h",
"opencl/image/*.h",
"opencl/buffer/*.h",
])),
copts = [
"-Werror",
"-Wextra",
"-Wno-missing-field-initializers",
] + if_openmp_enabled([
"-fopenmp",
]) + if_neon_enabled([
"-DMACE_ENABLE_NEON",
]) + if_android_armv7([
"-mfpu=neon",
]) + if_android_armv7([
"-mfloat-abi=softfp",
]) + if_opencl_enabled([
"-DMACE_ENABLE_OPENCL",
]) + if_hexagon_enabled([
"-DMACE_ENABLE_HEXAGON",
]),
linkopts = if_android(["-lm"]),
deps = [
"//mace/core",
"@gemmlowp",
"@tflite",
],
)
cc_test(
name = "kernels_test",
testonly = 1,
srcs = glob(
[
"*_test.cc",
"arm/*_test.cc",
"opencl/*_test.cc",
],
),
copts = [
"-Werror",
"-Wextra",
"-Wno-missing-field-initializers",
] + if_openmp_enabled([
"-fopenmp",
]) + if_neon_enabled([
"-DMACE_ENABLE_NEON",
]) + if_android_armv7([
"-mfpu=neon",
"-mfloat-abi=softfp",
]) + if_opencl_enabled([
"-DMACE_ENABLE_OPENCL",
]) + if_hexagon_enabled([
"-DMACE_ENABLE_HEXAGON",
]),
linkopts = ["-fopenmp"],
linkstatic = 1,
deps = [
":kernels",
"//mace/ops",
"@gtest",
"@gtest//:gtest_main",
],
)
cc_test(
name = "kernels_benchmark",
testonly = 1,
srcs = glob(["*_benchmark.cc"]),
copts = [
"-Werror",
"-Wextra",
"-Wno-missing-field-initializers",
] + if_openmp_enabled([
"-fopenmp",
]) + if_neon_enabled([
"-DMACE_ENABLE_NEON",
]) + if_android_armv7([
"-mfpu=neon",
"-mfloat-abi=softfp",
]) + if_opencl_enabled([
"-DMACE_ENABLE_OPENCL",
]) + if_hexagon_enabled([
"-DMACE_ENABLE_HEXAGON",
]),
linkopts = ["-fopenmp"],
linkstatic = 1,
deps = [
":kernels",
"//mace/core:test_benchmark_main",
"//mace/ops",
"//third_party/eigen3",
"@gemmlowp",
],
)
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <Eigen/Dense>
#include <algorithm>
#include <string>
#include <tuple>
#include <vector>
#include "public/gemmlowp.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/kernels/gemm.h"
#include "mace/kernels/sgemm.h"
#include "mace/ops/ops_test_util.h"
namespace gemmlowp {
template<typename tScalar, MapOrder tOrder>
class Matrix : public MatrixMap<tScalar, tOrder> {
public:
typedef MatrixMap<tScalar, tOrder> Map;
typedef MatrixMap<const tScalar, tOrder> ConstMap;
typedef typename Map::Scalar Scalar;
static const MapOrder Order = tOrder;
using Map::cols_;
using Map::data_;
using Map::kOrder;
using Map::rows_;
using Map::stride_;
public:
Matrix() : Map(nullptr, 0, 0, 0) {}
Matrix(int rows, int cols) : Map(nullptr, 0, 0, 0) { Resize(rows, cols); }
Matrix(const Matrix &other) : Map(nullptr, 0, 0, 0) { *this = other; }
Matrix &operator=(const Matrix &other) {
Resize(other.rows_, other.cols_);
std::memcpy(data_, other.data_, size() * sizeof(Scalar));
return *this;
}
friend bool operator==(const Matrix &a, const Matrix &b) {
return a.rows_ == b.rows_ && a.cols_ == b.cols_ &&
!std::memcmp(a.data_, b.data_, a.size());
}
void Resize(int rows, int cols) {
rows_ = rows;
cols_ = cols;
stride_ = kOrder == gemmlowp::MapOrder::ColMajor ? rows : cols;
storage.resize(size());
data_ = storage.data();
}
int size() const { return rows_ * cols_; }
Map &map() { return *static_cast<Map *>(this); }
ConstMap const_map() const { return ConstMap(data_, rows_, cols_, stride_); }
protected:
std::vector<Scalar> storage;
};
template<typename MatrixType>
void MakeZero(MatrixType *m) {
for (int c = 0; c < m->cols(); c++) {
for (int r = 0; r < m->rows(); r++) {
(*m)(r, c) = 128;
}
}
}
} // namespace gemmlowp
namespace mace {
namespace kernels {
namespace test {
// Test the speed of different access order of a NHWC buffer
namespace {
// Matmul with (m, k) x (k, n)
void MatmulBenchmark_Mace(int iters, int m, int k, int n) {
mace::testing::StopTiming();
std::vector<float> lhs(m * k);
std::vector<float> rhs(k * n);
std::vector<float> result(m * n);
// warm up
Gemm(lhs.data(), rhs.data(), 1, m, k, n, result.data());
mace::testing::StartTiming();
while (iters--) {
Gemm(lhs.data(), rhs.data(), 1, m, k, n, result.data());
}
}
void MatmulBenchmark_Mace_SGemm(int iters, int m, int k, int n) {
mace::testing::StopTiming();
std::vector<float> lhs(m * k);
std::vector<float> rhs(k * n);
std::vector<float> result(m * n);
kernels::MatrixMap<const float> matrix_lhs(1, m, k, RowMajor, lhs.data(),
true);
kernels::MatrixMap<const float> matrix_rhs(1, k, n, RowMajor, rhs.data(),
true);
kernels::MatrixMap<float> matrix_result(1, m, n, RowMajor, result.data());
kernels::SGemm sgemm;
sgemm(matrix_lhs, matrix_rhs, &matrix_result);
mace::testing::StartTiming();
while (iters--) {
sgemm(matrix_lhs, matrix_rhs, &matrix_result);
}
}
void MatmulBenchmark_Eigen(int iters, int m, int k, int n) {
mace::testing::StopTiming();
Eigen::MatrixXf lhs = Eigen::MatrixXf::Random(m, k);
Eigen::MatrixXf rhs = Eigen::MatrixXf::Random(k, n);
Eigen::MatrixXf result = Eigen::MatrixXf::Zero(m, n);
// warm up
result = lhs * rhs;
mace::testing::StartTiming();
while (iters--) {
result = lhs * rhs;
}
}
void MatmulBenchmark_gemmlowp_uint8(int iters, int rows, int depth, int cols) {
mace::testing::StopTiming();
gemmlowp::Matrix<std::uint8_t, gemmlowp::MapOrder::RowMajor> lhs;
gemmlowp::Matrix<std::uint8_t, gemmlowp::MapOrder::ColMajor> rhs;
gemmlowp::Matrix<std::uint8_t, gemmlowp::MapOrder::ColMajor> result;
lhs.Resize(rows, depth);
rhs.Resize(depth, cols);
result.Resize(rows, cols);
gemmlowp::MakeZero(&lhs);
gemmlowp::MakeZero(&rhs);
gemmlowp::MakeZero(&result);
gemmlowp::OutputStageQuantizeDownInt32ByFixedPoint quantize_down_stage;
quantize_down_stage.result_offset_after_shift = 128;
quantize_down_stage.result_fixedpoint_multiplier = 1234567890;
quantize_down_stage.result_shift = 16;
gemmlowp::OutputStageSaturatingCastToUint8 saturating_cast_stage;
const auto output_pipeline =
std::make_tuple(quantize_down_stage, saturating_cast_stage);
auto gemm_context =
mace::ops::test::OpTestContext::Get()
->GetDevice(CPU)->cpu_runtime()->GetGemmlowpContext();
MACE_CHECK_NOTNULL(gemm_context);
using BitDepthParams = gemmlowp::L8R8WithLhsNonzeroBitDepthParams;
gemmlowp::GemmWithOutputPipeline<std::uint8_t, std::uint8_t, BitDepthParams>(
gemm_context, lhs.const_map(), rhs.const_map(), &result.map(), -128,
-128, output_pipeline);
mace::testing::StartTiming();
while (iters--) {
gemmlowp::GemmWithOutputPipeline<std::uint8_t, std::uint8_t,
BitDepthParams>(
gemm_context, lhs.const_map(), rhs.const_map(), &result.map(), -128,
-128, output_pipeline);
}
}
void MatmulBenchmark_gemmlowp_int32(int iters, int rows, int depth, int cols) {
mace::testing::StopTiming();
gemmlowp::Matrix<std::uint8_t, gemmlowp::MapOrder::RowMajor> lhs;
gemmlowp::Matrix<std::uint8_t, gemmlowp::MapOrder::ColMajor> rhs;
gemmlowp::Matrix<std::int32_t, gemmlowp::MapOrder::ColMajor> result;
lhs.Resize(rows, depth);
rhs.Resize(depth, cols);
result.Resize(rows, cols);
gemmlowp::MakeZero(&lhs);
gemmlowp::MakeZero(&rhs);
gemmlowp::MakeZero(&result);
const auto output_pipeline = std::make_tuple();
auto gemm_context =
mace::ops::test::OpTestContext::Get()
->GetDevice(CPU)->cpu_runtime()->GetGemmlowpContext();
MACE_CHECK_NOTNULL(gemm_context);
using BitDepthParams = gemmlowp::L8R8WithLhsNonzeroBitDepthParams;
gemmlowp::GemmWithOutputPipeline<std::uint8_t, std::int32_t, BitDepthParams>(
gemm_context, lhs.const_map(), rhs.const_map(), &result.map(), -128,
-128, output_pipeline);
mace::testing::StartTiming();
while (iters--) {
gemmlowp::GemmWithOutputPipeline<std::uint8_t, std::int32_t,
BitDepthParams>(
gemm_context, lhs.const_map(), rhs.const_map(), &result.map(), -128,
-128, output_pipeline);
}
}
} // namespace
#define MACE_BM_MATMUL_FUNC(M, K, N, FUNC, TYPE) \
static void MACE_BM_MATMUL_##M##_##K##_##N##_##FUNC(int iters) { \
const int64_t macc = static_cast<int64_t>(iters) * M * K * N; \
const int64_t tot = static_cast<int64_t>(iters) * (M + N) * K; \
mace::testing::MaccProcessed(macc); \
mace::testing::BytesProcessed(tot * sizeof(TYPE)); \
MatmulBenchmark_##FUNC(iters, M, K, N); \
} \
MACE_BENCHMARK(MACE_BM_MATMUL_##M##_##K##_##N##_##FUNC)
#define MACE_BM_MATMUL(M, K, N) \
MACE_BM_MATMUL_FUNC(M, K, N, Mace, float); \
MACE_BM_MATMUL_FUNC(M, K, N, Mace_SGemm, float); \
MACE_BM_MATMUL_FUNC(M, K, N, Eigen, float); \
MACE_BM_MATMUL_FUNC(M, K, N, gemmlowp_uint8, uint8_t); \
MACE_BM_MATMUL_FUNC(M, K, N, gemmlowp_int32, uint8_t);
// Embedding size 384
MACE_BM_MATMUL(7, 384, 384);
MACE_BM_MATMUL(7, 384, 1536);
MACE_BM_MATMUL(7, 1536, 384);
MACE_BM_MATMUL(15, 384, 384);
MACE_BM_MATMUL(15, 384, 1536);
MACE_BM_MATMUL(15, 1536, 384);
MACE_BM_MATMUL(1, 256, 256);
MACE_BM_MATMUL(1, 256, 1536);
MACE_BM_MATMUL(1, 1536, 256);
MACE_BM_MATMUL(256, 256, 1);
MACE_BM_MATMUL(1536, 256, 1);
MACE_BM_MATMUL(256, 1536, 1);
MACE_BM_MATMUL(29792, 256, 1);
MACE_BM_MATMUL(1, 256, 29792);
MACE_BM_MATMUL(2, 256, 256);
MACE_BM_MATMUL(2, 256, 1536);
MACE_BM_MATMUL(2, 1536, 256);
MACE_BM_MATMUL(3, 256, 256);
MACE_BM_MATMUL(3, 256, 1536);
MACE_BM_MATMUL(3, 1536, 256);
MACE_BM_MATMUL(4, 256, 256);
MACE_BM_MATMUL(4, 256, 1536);
MACE_BM_MATMUL(4, 1536, 256);
MACE_BM_MATMUL(8, 256, 256);
MACE_BM_MATMUL(8, 256, 1536);
MACE_BM_MATMUL(8, 1536, 256);
MACE_BM_MATMUL(10, 256, 256);
MACE_BM_MATMUL(10, 256, 1536);
MACE_BM_MATMUL(10, 1536, 256);
MACE_BM_MATMUL(15, 256, 256);
MACE_BM_MATMUL(15, 256, 1536);
MACE_BM_MATMUL(15, 1536, 256);
// Embedding size 128
MACE_BM_MATMUL(1, 128, 1536);
MACE_BM_MATMUL(1, 128, 44678);
// MobileNet
MACE_BM_MATMUL(128, 128, 3136);
MACE_BM_MATMUL(256, 256, 784);
MACE_BM_MATMUL(512, 512, 196);
MACE_BM_MATMUL(1024, 1024, 49);
} // namespace test
} // namespace kernels
} // namespace mace
......@@ -40,7 +40,6 @@ cc_library(
deps = [
"//mace/public",
"//mace/ops",
"//mace/kernels",
],
alwayslink = 1,
)
......@@ -79,7 +78,7 @@ genrule(
srcs = [
"//mace/codegen:generated_version",
"//mace/core",
"//mace/kernels",
"//mace/ops:internal_ops",
"//mace/ops",
"//mace/libmace",
"//mace/utils",
......@@ -93,7 +92,7 @@ genrule(
"mri_stream=$$(python $(location //mace/python/tools:archive_static_lib) " +
"$(locations //mace/codegen:generated_version) " +
"$(locations //mace/core:core) " +
"$(locations //mace/kernels:kernels) " +
"$(locations //mace/ops:internal_ops) " +
"$(locations //mace/ops:ops) " +
"$(locations //mace/libmace:libmace) " +
"$(locations //mace/utils:utils) " +
......
......@@ -22,8 +22,7 @@
#include "mace/core/net.h"
#include "mace/core/device_context.h"
#include "mace/kernels/ops_register.h"
#include "mace/ops/ops_def_register.h"
#include "mace/ops/ops_registry.h"
#include "mace/public/mace.h"
#ifdef MACE_ENABLE_OPENCL
......@@ -359,7 +358,6 @@ class MaceEngine::Impl {
private:
const unsigned char *model_data_;
size_t model_data_size_;
std::unique_ptr<OpDefRegistryBase> op_def_registry_;
std::unique_ptr<OpRegistryBase> op_registry_;
DeviceType device_type_;
std::unique_ptr<Device> device_;
......@@ -377,7 +375,6 @@ class MaceEngine::Impl {
MaceEngine::Impl::Impl(const MaceEngineConfig &config)
: model_data_(nullptr),
model_data_size_(0),
op_def_registry_(new OpDefRegistry()),
op_registry_(new OpRegistry),
device_type_(config.impl_->device_type()),
device_(nullptr),
......@@ -466,7 +463,6 @@ MaceStatus MaceEngine::Impl::Init(
// Init model
auto net = std::unique_ptr<NetBase>(new SerialNet(
op_def_registry_.get(),
op_registry_.get(),
net_def,
ws_.get(),
......@@ -474,8 +470,7 @@ MaceStatus MaceEngine::Impl::Init(
NetMode::INIT));
MACE_RETURN_IF_ERROR(net->Init());
MACE_RETURN_IF_ERROR(net->Run());
net_ = std::unique_ptr<NetBase>(new SerialNet(op_def_registry_.get(),
op_registry_.get(),
net_ = std::unique_ptr<NetBase>(new SerialNet(op_registry_.get(),
net_def,
ws_.get(),
device_.get()));
......
# Description:
# Mace operators.
#
package(
default_visibility = ["//visibility:public"],
......@@ -18,18 +17,58 @@ load(
)
cc_library(
name = "test",
testonly = 1,
hdrs = glob([
"*_test_util.h",
]),
srcs = [
"ops_test_util.cc",
],
name = "internal_ops",
srcs = glob(
[
"*.cc",
"arm/*.cc",
],
exclude = [
"*_test.cc",
"*_benchmark.cc",
"arm/*_test.cc",
"ops_registry.cc",
"ops_test_util.cc",
"buffer_inverse_transform.cc",
"buffer_transform.cc",
"lstm_cell.cc",
"winograd_transform.cc",
],
) + if_opencl_enabled(glob(
[
"opencl/*.cc",
"opencl/image/*.cc",
"opencl/buffer/*.cc",
"buffer_inverse_transform.cc",
"buffer_transform.cc",
"lstm_cell.cc",
"winograd_transform.cc",
],
exclude = [
"opencl/*_test.cc",
],
)),
hdrs = glob(
[
"*.h",
"arm/*.h",
],
exclude = [
"ops_registry.h",
"ops_test_util.h",
]
) + if_opencl_enabled(glob([
"opencl/*.h",
"opencl/image/*.h",
"opencl/buffer/*.h",
])),
copts = [
"-Werror",
"-Wextra",
] + if_openmp_enabled(["-fopenmp"]) + if_neon_enabled([
"-Wno-missing-field-initializers",
] + if_openmp_enabled([
"-fopenmp",
]) + if_neon_enabled([
"-DMACE_ENABLE_NEON",
]) + if_android_armv7([
"-mfpu=neon",
......@@ -40,20 +79,54 @@ cc_library(
]) + if_hexagon_enabled([
"-DMACE_ENABLE_HEXAGON",
]),
linkopts = if_android(["-lm"]),
deps = [
"ops",
"//mace/kernels",
"@gtest",
"//mace/core",
"@gemmlowp",
"@tflite",
],
)
cc_library(
name = "ops",
srcs = [
"ops_def_register.cc",
"ops_registry.cc"
],
hdrs = [
"ops_def_register.h",
"ops_registry.h",
],
copts = [
"-Werror",
"-Wextra",
"-Wno-missing-field-initializers",
] + if_openmp_enabled([
"-fopenmp",
]) + if_neon_enabled([
"-DMACE_ENABLE_NEON",
]) + if_android_armv7([
"-mfpu=neon",
]) + if_android_armv7([
"-mfloat-abi=softfp",
]) + if_opencl_enabled([
"-DMACE_ENABLE_OPENCL",
]) + if_hexagon_enabled([
"-DMACE_ENABLE_HEXAGON",
]),
linkopts = if_android(["-lm"]),
deps = [
"internal_ops",
],
)
cc_library(
name = "test",
testonly = 1,
hdrs = glob([
"*_test_util.h",
]),
srcs = [
"ops_test_util.cc",
],
copts = [
"-Werror",
......@@ -70,7 +143,8 @@ cc_library(
"-DMACE_ENABLE_HEXAGON",
]),
deps = [
"//mace/core",
"ops",
"@gtest",
],
)
......@@ -78,16 +152,22 @@ cc_test(
name = "ops_test",
testonly = 1,
srcs = glob(
["*_test.cc"],
[
"*_test.cc",
"arm/*_test.cc",
"opencl/*_test.cc",
],
),
copts = [
"-Werror",
"-Wextra",
] + if_openmp_enabled(["-fopenmp"]) + if_neon_enabled([
"-Wno-missing-field-initializers",
] + if_openmp_enabled([
"-fopenmp",
]) + if_neon_enabled([
"-DMACE_ENABLE_NEON",
]) + if_android_armv7([
"-mfpu=neon",
]) + if_android_armv7([
"-mfloat-abi=softfp",
]) + if_opencl_enabled([
"-DMACE_ENABLE_OPENCL",
......@@ -97,8 +177,7 @@ cc_test(
linkopts = ["-fopenmp"],
linkstatic = 1,
deps = [
":ops",
":test",
"test",
"@gtest//:gtest_main",
],
)
......@@ -110,11 +189,13 @@ cc_test(
copts = [
"-Werror",
"-Wextra",
] + if_openmp_enabled(["-fopenmp"]) + if_neon_enabled([
"-Wno-missing-field-initializers",
] + if_openmp_enabled([
"-fopenmp",
]) + if_neon_enabled([
"-DMACE_ENABLE_NEON",
]) + if_android_armv7([
"-mfpu=neon",
]) + if_android_armv7([
"-mfloat-abi=softfp",
]) + if_opencl_enabled([
"-DMACE_ENABLE_OPENCL",
......@@ -124,8 +205,9 @@ cc_test(
linkopts = ["-fopenmp"],
linkstatic = 1,
deps = [
":ops",
":test",
"test",
"//mace/core:test_benchmark_main",
"//third_party/eigen3",
"@gemmlowp",
],
)
......@@ -12,18 +12,18 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/kernels/activation.h"
#include "mace/ops/activation.h"
#include <memory>
#include "mace/core/operator.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/activation.h"
#include "mace/ops/opencl/image/activation.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class ActivationOp;
......@@ -33,7 +33,7 @@ class ActivationOp<DeviceType::CPU, float> : public Operation {
public:
explicit ActivationOp(OpConstructContext *context)
: Operation(context),
activation_(kernels::StringToActivationType(
activation_(ops::StringToActivationType(
Operation::GetOptionalArg<std::string>("activation",
"NOOP"))),
relux_max_limit_(Operation::GetOptionalArg<float>("max_limit",
......@@ -74,7 +74,7 @@ class ActivationOp<DeviceType::GPU, T> : public Operation {
public:
explicit ActivationOp(OpConstructContext *context)
: Operation(context) {
ActivationType type = kernels::StringToActivationType(
ActivationType type = ops::StringToActivationType(
Operation::GetOptionalArg<std::string>("activation",
"NOOP"));
auto relux_max_limit = static_cast<T>(
......@@ -114,5 +114,5 @@ void RegisterActivation(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,19 +12,19 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_KERNELS_ACTIVATION_H_
#define MACE_KERNELS_ACTIVATION_H_
#ifndef MACE_OPS_ACTIVATION_H_
#define MACE_OPS_ACTIVATION_H_
#include <algorithm>
#include <cmath>
#include <string>
#include "mace/core/types.h"
#include "mace/kernels/arm/activation_neon.h"
#include "mace/ops/arm/activation_neon.h"
#include "mace/utils/logging.h"
namespace mace {
namespace kernels {
namespace ops {
enum ActivationType {
NOOP = 0,
......@@ -149,7 +149,7 @@ void PReLUActivation(const T *input_ptr,
}
}
} // namespace kernels
} // namespace ops
} // namespace mace
#endif // MACE_KERNELS_ACTIVATION_H_
#endif // MACE_OPS_ACTIVATION_H_
......@@ -14,7 +14,6 @@
#include <string>
#include "mace/core/op_def_registry.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/ops/ops_test_util.h"
......@@ -46,7 +45,7 @@ void ReluBenchmark(int iters, int batch, int channels, int height, int width) {
.Finalize(net.NewOperatorDef());
} else if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Activation", "ReluBM")
.Input("InputImage")
......@@ -108,7 +107,7 @@ void ReluxBenchmark(int iters, int batch, int channels, int height, int width) {
if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Activation", "ReluxBM")
.Input("InputImage")
......@@ -186,9 +185,9 @@ void PreluBenchmark(int iters, int batch, int channels, int height, int width) {
.Finalize(net.NewOperatorDef());
} else if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, float>(&net, "Alpha", "AlphaImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Activation", "PreluBM")
.Input("InputImage")
......@@ -251,7 +250,7 @@ void TanhBenchmark(int iters, int batch, int channels, int height, int width) {
if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Activation", "TanhBM")
.Input("InputImage")
......@@ -318,7 +317,7 @@ void SigmoidBenchmark(
if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Activation", "SigmoidBM")
.Input("InputImage")
......
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -33,7 +32,7 @@ void TestSimpleRelu() {
if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Activation", "ReluTest")
.Input("InputImage")
......@@ -46,7 +45,7 @@ void TestSimpleRelu() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
OpDefBuilder("Activation", "ReluTest")
.Input("Input")
......@@ -81,7 +80,7 @@ void TestUnalignedSimpleRelu() {
if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Activation", "ReluTest")
.Input("InputImage")
......@@ -94,7 +93,7 @@ void TestUnalignedSimpleRelu() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
OpDefBuilder("Activation", "ReluTest")
.Input("Input")
......@@ -132,7 +131,7 @@ void TestSimpleRelux() {
if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Activation", "ReluxTest")
.Input("InputImage")
......@@ -146,7 +145,7 @@ void TestSimpleRelux() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
OpDefBuilder("Activation", "ReluxTest")
.Input("Input")
......@@ -182,7 +181,7 @@ void TestSimpleReluRelux() {
if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Activation", "ReluxTest")
.Input("InputImage")
......@@ -196,7 +195,7 @@ void TestSimpleReluRelux() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
OpDefBuilder("Activation", "ReluxTest")
.Input("Input")
......@@ -237,9 +236,9 @@ void TestSimplePrelu() {
if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, float>(&net, "Alpha", "AlphaImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Activation", "PreluTest")
.Input("InputImage")
......@@ -253,7 +252,7 @@ void TestSimplePrelu() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
OpDefBuilder("Activation", "PreluTest")
.Input("Input")
......@@ -293,7 +292,7 @@ void TestSimpleTanh() {
if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Activation", "TanhTest")
.Input("InputImage")
......@@ -306,7 +305,7 @@ void TestSimpleTanh() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
OpDefBuilder("Activation", "TanhTest")
.Input("Input")
......@@ -346,7 +345,7 @@ void TestSimpleSigmoid() {
if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Activation", "SigmoidTest")
.Input("InputImage")
......@@ -359,7 +358,7 @@ void TestSimpleSigmoid() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
OpDefBuilder("Activation", "SigmoidTest")
.Input("Input")
......
......@@ -22,11 +22,11 @@
#include "mace/core/operator.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/addn.h"
#include "mace/ops/opencl/image/addn.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
static constexpr int kCostPerGroup = 1024;
......@@ -142,5 +142,5 @@ void RegisterAddN(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -14,7 +14,6 @@
#include <string>
#include "mace/core/op_def_registry.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/ops/ops_test_util.h"
......@@ -37,7 +36,7 @@ void AddNBenchmark(int iters, int inputs, int n, int h, int w, int c) {
for (int i = 0; i < inputs; ++i) {
BufferToImage<D, T>(&net, MakeString("Input", i).c_str(),
MakeString("InputImage", i).c_str(),
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
}
OpDefBuilder op_def_builder("AddN", "AddNBM");
for (int i = 0; i < inputs; ++i) {
......
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -69,7 +68,7 @@ void SimpleAdd3() {
for (int i = 0; i < input_num; ++i) {
BufferToImage<D, half>(&net, MakeString("Input", i),
MakeString("InputImage", i),
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
}
auto op_def_cl = OpDefBuilder("AddN", "AddNTest");
......@@ -84,7 +83,7 @@ void SimpleAdd3() {
net.RunOp(D);
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
OpDefBuilder("AddN", "AddNTest")
.Input("Input0")
......@@ -143,7 +142,7 @@ void RandomTest() {
for (int i = 0; i < input_num; ++i) {
BufferToImage<D, half>(&net, MakeString("Input", i),
MakeString("InputImage", i),
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
}
auto op_def_cl = OpDefBuilder("AddN", "AddNTest");
......@@ -158,7 +157,7 @@ void RandomTest() {
net.RunOp(D);
ImageToBuffer<D, float>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"), 1e-2,
1e-2);
......
......@@ -21,7 +21,7 @@
#include "mace/core/operator.h"
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class ArgMaxOp : public Operation {
......@@ -84,5 +84,5 @@ void RegisterArgMax(OpRegistryBase *op_registry) {
DeviceType::CPU, float);
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......
......@@ -17,10 +17,10 @@
#endif
#include <algorithm>
#include "mace/kernels/arm/activation_neon.h"
#include "mace/ops/arm/activation_neon.h"
namespace mace {
namespace kernels {
namespace ops {
void ReluNeon(const float *input, const index_t size, float *output) {
#if defined(MACE_ENABLE_NEON)
......@@ -67,5 +67,5 @@ void ReluxNeon(const float *input, const float limit,
#endif
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,20 +12,20 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_KERNELS_ARM_ACTIVATION_NEON_H_
#define MACE_KERNELS_ARM_ACTIVATION_NEON_H_
#ifndef MACE_OPS_ARM_ACTIVATION_NEON_H_
#define MACE_OPS_ARM_ACTIVATION_NEON_H_
#include "mace/core/types.h"
namespace mace {
namespace kernels {
namespace ops {
void ReluNeon(const float *input, const index_t size, float *output);
void ReluxNeon(const float *input, const float limit,
const index_t size, float *output);
} // namespace kernels
} // namespace ops
} // namespace mace
#endif // MACE_KERNELS_ARM_ACTIVATION_NEON_H_
#endif // MACE_OPS_ARM_ACTIVATION_NEON_H_
......@@ -12,14 +12,14 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_KERNELS_ARM_CONV_2D_NEON_H_
#define MACE_KERNELS_ARM_CONV_2D_NEON_H_
#ifndef MACE_OPS_ARM_CONV_2D_NEON_H_
#define MACE_OPS_ARM_CONV_2D_NEON_H_
#include "mace/core/types.h"
#include "mace/kernels/sgemm.h"
#include "mace/ops/sgemm.h"
namespace mace {
namespace kernels {
namespace ops {
void Conv2dNeonK1x1S1(const float *input,
const float *filter,
......@@ -115,7 +115,7 @@ inline void Conv2dCPUKHxKWCalc(const float *in_ptr,
}
}
} // namespace kernels
} // namespace ops
} // namespace mace
#endif // MACE_KERNELS_ARM_CONV_2D_NEON_H_
#endif // MACE_OPS_ARM_CONV_2D_NEON_H_
......@@ -16,11 +16,11 @@
#include <arm_neon.h>
#endif
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/ops/arm/conv_2d_neon.h"
#include "mace/utils/utils.h"
namespace mace {
namespace kernels {
namespace ops {
inline void Conv2dCPUK15x1Calc(const float *in_ptr,
const float *filter_ptr,
......@@ -157,5 +157,5 @@ void Conv2dNeonK15x1S1(const float *input,
} // b
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,10 +12,10 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/ops/arm/conv_2d_neon.h"
namespace mace {
namespace kernels {
namespace ops {
void Conv2dNeonK1x1S1(const float *input,
const float *filter,
......@@ -44,5 +44,5 @@ void Conv2dNeonK1x1S1(const float *input,
}
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -16,12 +16,12 @@
#include <arm_neon.h>
#endif
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/ops/arm/conv_2d_neon.h"
#include "mace/utils/logging.h"
#include "mace/utils/utils.h"
namespace mace {
namespace kernels {
namespace ops {
inline void Conv2dCPUK1x15Calc(const float *in_ptr,
const float *filter_ptr,
......@@ -143,5 +143,5 @@ void Conv2dNeonK1x15S1(const float *input,
} // b
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -16,10 +16,10 @@
#include <arm_neon.h>
#endif
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/ops/arm/conv_2d_neon.h"
namespace mace {
namespace kernels {
namespace ops {
// Ho = 1, Wo = 4, Co = 4
void Conv2dNeonK1x7S1(const float *input,
......@@ -247,5 +247,5 @@ void Conv2dNeonK1x7S1(const float *input,
} // b
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -17,10 +17,10 @@
#endif
#include "mace/core/macros.h"
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/ops/arm/conv_2d_neon.h"
namespace mace {
namespace kernels {
namespace ops {
// Ho = 2, Wo = 4, Co = 2
void Conv2dNeonK3x3S1(const float *input,
......@@ -658,5 +658,5 @@ void Conv2dNeonK3x3S2(const float *input,
} // b
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -16,10 +16,10 @@
#include <arm_neon.h>
#endif
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/ops/arm/conv_2d_neon.h"
namespace mace {
namespace kernels {
namespace ops {
#define MACE_Conv2dNeonK5x5SnLoadCalc4 \
/* load filter (4 outch x 1 height x 4 width) */ \
......@@ -215,5 +215,5 @@ void Conv2dNeonK5x5S1(const float *input,
} // b
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -16,10 +16,10 @@
#include <arm_neon.h>
#endif
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/ops/arm/conv_2d_neon.h"
namespace mace {
namespace kernels {
namespace ops {
// Ho = 4, Wo = 1, Co = 4
void Conv2dNeonK7x1S1(const float *input,
......@@ -287,5 +287,5 @@ void Conv2dNeonK7x1S1(const float *input,
} // b
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -16,10 +16,10 @@
#include <arm_neon.h>
#endif
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/ops/arm/conv_2d_neon.h"
namespace mace {
namespace kernels {
namespace ops {
#define MACE_Conv2dArmv8NeonK7x7SnLoadCalc4 \
/* load filter (4 outch x 1 height x 4 width) */ \
......@@ -638,5 +638,5 @@ void Conv2dNeonK7x7S3(const float *input,
} // b
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -14,11 +14,11 @@
#include <algorithm>
#include "mace/kernels/arm/conv_winograd.h"
#include "mace/kernels/gemm.h"
#include "mace/ops/arm/conv_winograd.h"
#include "mace/ops/gemm.h"
namespace mace {
namespace kernels {
namespace ops {
namespace {
// NCHW => NTCB (T: in tile pixels, B: tile indices)
......@@ -747,5 +747,5 @@ void ConvRef3x3s1(const float *input,
}
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,18 +12,18 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_KERNELS_ARM_CONV_WINOGRAD_H_
#define MACE_KERNELS_ARM_CONV_WINOGRAD_H_
#ifndef MACE_OPS_ARM_CONV_WINOGRAD_H_
#define MACE_OPS_ARM_CONV_WINOGRAD_H_
#if defined(MACE_ENABLE_NEON) && defined(__aarch64__)
#include <arm_neon.h>
#endif
#include "mace/core/types.h"
#include "mace/kernels/sgemm.h"
#include "mace/ops/sgemm.h"
namespace mace {
namespace kernels {
namespace ops {
void TransformFilter4x4(const float *filter,
const index_t in_channels,
......@@ -70,7 +70,7 @@ void ConvRef3x3s1(const float *input,
const index_t out_channels,
float *output);
} // namespace kernels
} // namespace ops
} // namespace mace
#endif // MACE_KERNELS_ARM_CONV_WINOGRAD_H_
#endif // MACE_OPS_ARM_CONV_WINOGRAD_H_
......@@ -19,10 +19,10 @@
#include "mace/core/tensor.h"
#include "mace/core/types.h"
#include "mace/kernels/arm/conv_winograd.h"
#include "mace/ops/arm/conv_winograd.h"
namespace mace {
namespace kernels {
namespace ops {
TEST(ConvWinogradTest, winograd) {
index_t batch = 1;
......@@ -62,11 +62,11 @@ TEST(ConvWinogradTest, winograd) {
return std::max(-1.0f, std::min(1.0f, nd(gen)));
});
kernels::ConvRef3x3s1(input_data, filter_data, batch, in_height, in_width,
ops::ConvRef3x3s1(input_data, filter_data, batch, in_height, in_width,
in_channels, out_channels, output_data_ref);
SGemm sgemm;
kernels::WinoGradConv3x3s1(input_data, filter_data, batch, in_height,
ops::WinoGradConv3x3s1(input_data, filter_data, batch, in_height,
in_width, in_channels, out_channels, 6,
output_data, &sgemm, nullptr);
......@@ -76,5 +76,5 @@ TEST(ConvWinogradTest, winograd) {
}
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,8 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_KERNELS_ARM_DECONV_2D_NEON_H_
#define MACE_KERNELS_ARM_DECONV_2D_NEON_H_
#ifndef MACE_OPS_ARM_DECONV_2D_NEON_H_
#define MACE_OPS_ARM_DECONV_2D_NEON_H_
#if defined(MACE_ENABLE_NEON)
#include <arm_neon.h>
......@@ -22,7 +22,7 @@
#include "mace/core/types.h"
namespace mace {
namespace kernels {
namespace ops {
void Deconv2dNeonK3x3S1(const float *input,
const float *filter,
......@@ -90,7 +90,7 @@ inline float32x4_t neon_vfma_lane_3(float32x4_t a,
}
#endif
} // namespace kernels
} // namespace ops
} // namespace mace
#endif // MACE_KERNELS_ARM_DECONV_2D_NEON_H_
#endif // MACE_OPS_ARM_DECONV_2D_NEON_H_
......@@ -13,10 +13,10 @@
// limitations under the License.
#include "mace/core/macros.h"
#include "mace/kernels/arm/deconv_2d_neon.h"
#include "mace/ops/arm/deconv_2d_neon.h"
namespace mace {
namespace kernels {
namespace ops {
void Deconv2dNeonK3x3S1(const float *input,
const float *filter,
......@@ -387,5 +387,5 @@ void Deconv2dNeonK3x3S2(const float *input,
}
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -13,10 +13,10 @@
// limitations under the License.
#include "mace/core/macros.h"
#include "mace/kernels/arm/deconv_2d_neon.h"
#include "mace/ops/arm/deconv_2d_neon.h"
namespace mace {
namespace kernels {
namespace ops {
void Deconv2dNeonK4x4S1(const float *input,
const float *filter,
......@@ -501,5 +501,5 @@ void Deconv2dNeonK4x4S2(const float *input,
}
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,13 +12,13 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_KERNELS_ARM_DEPTHWISE_CONV2D_NEON_H_
#define MACE_KERNELS_ARM_DEPTHWISE_CONV2D_NEON_H_
#ifndef MACE_OPS_ARM_DEPTHWISE_CONV2D_NEON_H_
#define MACE_OPS_ARM_DEPTHWISE_CONV2D_NEON_H_
#include "mace/core/types.h"
namespace mace {
namespace kernels {
namespace ops {
void DepthwiseConv2dNeonK3x3S1(const float *input,
const float *filter,
......@@ -42,7 +42,7 @@ void DepthwiseConv2dNeonK3x3S2(const float *input,
const index_t valid_w_stop,
float *output);
} // namespace kernels
} // namespace ops
} // namespace mace
#endif // MACE_KERNELS_ARM_DEPTHWISE_CONV2D_NEON_H_
#endif // MACE_OPS_ARM_DEPTHWISE_CONV2D_NEON_H_
......@@ -17,10 +17,10 @@
#endif
#include "mace/core/macros.h"
#include "mace/kernels/arm/depthwise_conv2d_neon.h"
#include "mace/ops/arm/depthwise_conv2d_neon.h"
namespace mace {
namespace kernels {
namespace ops {
namespace {
void DepthwiseConv2dPixel(const float *in_base,
......@@ -381,5 +381,5 @@ void DepthwiseConv2dNeonK3x3S2(const float *input,
} // b
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -17,13 +17,13 @@
#include <vector>
#include "mace/core/operator.h"
#include "mace/kernels/activation.h"
#include "mace/ops/activation.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/batch_norm.h"
#include "mace/ops/opencl/image/batch_norm.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class BatchNormOp;
......@@ -35,7 +35,7 @@ class BatchNormOp<DeviceType::CPU, float> : public Operation {
: Operation(context),
epsilon_(Operation::GetOptionalArg<float>("epsilon",
static_cast<float>(1e-4))),
activation_(kernels::StringToActivationType(
activation_(ops::StringToActivationType(
Operation::GetOptionalArg<std::string>("activation", "NOOP"))),
relux_max_limit_(Operation::GetOptionalArg<float>("max_limit", 0.0f)) {}
......@@ -144,7 +144,7 @@ class BatchNormOp<DeviceType::GPU, T> : public Operation {
: Operation(context) {
float epsilon = Operation::GetOptionalArg<float>(
"epsilon", static_cast<float>(1e-4));
ActivationType activation = kernels::StringToActivationType(
ActivationType activation = ops::StringToActivationType(
Operation::GetOptionalArg<std::string>("activation", "NOOP"));
float relux_max_limit = Operation::GetOptionalArg<float>("max_limit", 0.0f);
if (context->device()->opencl_runtime()->UseImageMemory()) {
......@@ -205,5 +205,5 @@ void RegisterBatchNorm(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/core/runtime/opencl/opencl_runtime.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/ops/ops_test_util.h"
......@@ -54,15 +53,15 @@ void BatchNorm(
.Finalize(net.NewOperatorDef());
} else if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, float>(&net, "Scale", "ScaleImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<D, float>(&net, "Offset", "OffsetImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<D, float>(&net, "Mean", "MeanImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<D, float>(&net, "Var", "VarImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("BatchNorm", "BatchNormBM")
.Input("InputImage")
.Input("ScaleImage")
......
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -51,15 +50,15 @@ void Simple() {
net.TransformDataFormat<D, float>("OutputNCHW", NCHW, "Output", NHWC);
} else if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, float>(&net, "Scale", "ScaleImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<D, float>(&net, "Offset", "OffsetImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<D, float>(&net, "Mean", "MeanImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<D, float>(&net, "Var", "VarImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("BatchNorm", "BatchNormTest")
.Input("InputImage")
......@@ -75,7 +74,7 @@ void Simple() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
}
// Check
......@@ -135,15 +134,15 @@ TEST_F(BatchNormOpTest, SimpleRandomOPENCL) {
// Run on opencl
BufferToImage<DeviceType::GPU, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<DeviceType::GPU, float>(&net, "Scale", "ScaleImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<DeviceType::GPU, float>(&net, "Offset", "OffsetImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<DeviceType::GPU, float>(&net, "Mean", "MeanImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<DeviceType::GPU, float>(&net, "Var", "VarImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("BatchNorm", "BatchNormTest")
.Input("InputImage")
......@@ -165,7 +164,7 @@ TEST_F(BatchNormOpTest, SimpleRandomOPENCL) {
net.Sync();
ImageToBuffer<DeviceType::GPU, float>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"),
1e-5, 1e-4);
}
......@@ -214,15 +213,15 @@ TEST_F(BatchNormOpTest, SimpleRandomHalfOPENCL) {
// Run on opencl
BufferToImage<DeviceType::GPU, half>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<DeviceType::GPU, half>(&net, "Scale", "ScaleImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<DeviceType::GPU, half>(&net, "Offset", "OffsetImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<DeviceType::GPU, half>(&net, "Mean", "MeanImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<DeviceType::GPU, half>(&net, "Var", "VarImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("BatchNorm", "BatchNormTest")
.Input("InputImage")
......@@ -245,7 +244,7 @@ TEST_F(BatchNormOpTest, SimpleRandomHalfOPENCL) {
net.Sync();
ImageToBuffer<DeviceType::GPU, float>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"),
1e-1, 1e-2);
}
......@@ -294,15 +293,15 @@ TEST_F(BatchNormOpTest, ComplexRandomOPENCL) {
// Run on opencl
BufferToImage<DeviceType::GPU, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<DeviceType::GPU, float>(&net, "Scale", "ScaleImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<DeviceType::GPU, float>(&net, "Offset", "OffsetImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<DeviceType::GPU, float>(&net, "Mean", "MeanImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<DeviceType::GPU, float>(&net, "Var", "VarImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("BatchNorm", "BatchNormTest")
.Input("InputImage")
......@@ -324,7 +323,7 @@ TEST_F(BatchNormOpTest, ComplexRandomOPENCL) {
net.Sync();
ImageToBuffer<DeviceType::GPU, float>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"),
1e-5, 1e-4);
}
......@@ -373,15 +372,15 @@ TEST_F(BatchNormOpTest, ComplexRandomHalfOPENCL) {
// Run on opencl
BufferToImage<DeviceType::GPU, half>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<DeviceType::GPU, half>(&net, "Scale", "ScaleImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<DeviceType::GPU, half>(&net, "Offset", "OffsetImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<DeviceType::GPU, half>(&net, "Mean", "MeanImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<DeviceType::GPU, half>(&net, "Var", "VarImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("BatchNorm", "BatchNormTest")
.Input("InputImage")
......@@ -404,7 +403,7 @@ TEST_F(BatchNormOpTest, ComplexRandomHalfOPENCL) {
net.Sync();
ImageToBuffer<DeviceType::GPU, float>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"),
1e-1, 1e-2);
}
......
......@@ -17,11 +17,11 @@
#include "mace/core/operator.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/batch_to_space.h"
#include "mace/ops/opencl/image/batch_to_space.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
class BatchToSpaceOpBase : public Operation {
public:
......@@ -303,5 +303,5 @@ void RegisterBatchToSpaceND(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/ops/ops_test_util.h"
......@@ -42,7 +41,7 @@ void BMBatchToSpace(
.Finalize(net.NewOperatorDef());
} else if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("BatchToSpaceND", "BatchToSpaceNDTest")
.Input("InputImage")
.Output("OutputImage")
......
......@@ -17,13 +17,13 @@
#include <vector>
#include "mace/core/operator.h"
#include "mace/kernels/activation.h"
#include "mace/ops/activation.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/bias_add.h"
#include "mace/ops/opencl/image/bias_add.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class BiasAddOp;
......@@ -139,5 +139,5 @@ void RegisterBiasAdd(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/core/runtime/opencl/opencl_runtime.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/ops/ops_test_util.h"
......@@ -47,9 +46,9 @@ void BiasAdd(int iters, int batch, int channels, int height, int width) {
.Finalize(net.NewOperatorDef());
} else if (D == DeviceType::GPU) {
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("BiasAdd", "BiasAddBM")
.Input("InputImage")
.Input("BiasImage")
......
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -46,9 +45,9 @@ void BiasAddSimple() {
"Output", NHWC);
} else if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, float>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("BiasAdd", "BiasAddTest")
.Input("InputImage")
......@@ -60,7 +59,7 @@ void BiasAddSimple() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
MACE_NOT_IMPLEMENTED;
}
......@@ -116,9 +115,9 @@ TEST_F(BiasAddOpTest, SimpleRandomOPENCL) {
// Run on opencl
BufferToImage<DeviceType::GPU, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<DeviceType::GPU, float>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("BiasAdd", "BiasAddTest")
.Input("InputImage")
......@@ -131,7 +130,7 @@ TEST_F(BiasAddOpTest, SimpleRandomOPENCL) {
net.Sync();
ImageToBuffer<DeviceType::GPU, float>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"), 1e-5);
}
......@@ -172,9 +171,9 @@ TEST_F(BiasAddOpTest, ComplexRandomOPENCL) {
// Run on opencl
BufferToImage<DeviceType::GPU, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<DeviceType::GPU, float>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("BiasAdd", "BiasAddTest")
.Input("InputImage")
......@@ -187,7 +186,7 @@ TEST_F(BiasAddOpTest, ComplexRandomOPENCL) {
net.Sync();
ImageToBuffer<DeviceType::GPU, float>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"), 1e-5);
}
......
......@@ -15,11 +15,11 @@
#include <memory>
#include "mace/core/operator.h"
#include "mace/kernels/opencl/buffer/buffer_inverse_transform.h"
#include "mace/kernels/opencl/image/image_to_buffer.h"
#include "mace/ops/opencl/buffer/buffer_inverse_transform.h"
#include "mace/ops/opencl/image/image_to_buffer.h"
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class BufferInverseTransformOp;
......@@ -41,9 +41,9 @@ class BufferInverseTransformOp<DeviceType::GPU, T> : public Operation {
const Tensor *input = this->Input(0);
Tensor *output = this->Output(0);
kernels::BufferType type =
static_cast<kernels::BufferType>(Operation::GetOptionalArg<int>(
"buffer_type", static_cast<int>(kernels::CONV2D_FILTER)));
ops::BufferType type =
static_cast<ops::BufferType>(Operation::GetOptionalArg<int>(
"buffer_type", static_cast<int>(ops::CONV2D_FILTER)));
return kernel_->Compute(context, input, type,
wino_blk_size_, output);
......@@ -63,5 +63,5 @@ void RegisterBufferInverseTransform(OpRegistryBase *op_registry) {
BufferInverseTransformOp, DeviceType::GPU, half);
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/core/runtime/opencl/opencl_runtime.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/ops/ops_test_util.h"
......
......@@ -54,103 +54,103 @@ void TestBidirectionTransform(const int type,
} // namespace
TEST(BufferToImageTest, ArgSmall) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::ARGUMENT, {1});
TestBidirectionTransform<DeviceType::GPU, float>(ops::ARGUMENT, {1});
}
TEST(BufferToImageTest, ArgHalfSmall) {
TestBidirectionTransform<DeviceType::GPU, half>(kernels::ARGUMENT, {11});
TestBidirectionTransform<DeviceType::GPU, half>(ops::ARGUMENT, {11});
}
TEST(BufferToImageTest, ArgMedium) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::ARGUMENT, {11});
TestBidirectionTransform<DeviceType::GPU, float>(ops::ARGUMENT, {11});
}
TEST(BufferToImageTest, ArgLarge) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::ARGUMENT, {256});
TestBidirectionTransform<DeviceType::GPU, float>(ops::ARGUMENT, {256});
}
TEST(BufferToImageTest, InputSmallSingleChannel) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::IN_OUT_CHANNEL,
TestBidirectionTransform<DeviceType::GPU, float>(ops::IN_OUT_CHANNEL,
{1, 2, 3, 1});
}
TEST(BufferToImageTest, InputSmallMultipleChannel) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::IN_OUT_CHANNEL,
TestBidirectionTransform<DeviceType::GPU, float>(ops::IN_OUT_CHANNEL,
{1, 2, 3, 3});
}
TEST(BufferToImageTest, InputSmallMultipleBatchAndChannel) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::IN_OUT_CHANNEL,
TestBidirectionTransform<DeviceType::GPU, float>(ops::IN_OUT_CHANNEL,
{3, 2, 3, 3});
}
TEST(BufferToImageTest, InputMedium) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::IN_OUT_CHANNEL,
TestBidirectionTransform<DeviceType::GPU, float>(ops::IN_OUT_CHANNEL,
{3, 13, 17, 128});
}
TEST(BufferToImageTest, InputLarge) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::IN_OUT_CHANNEL,
TestBidirectionTransform<DeviceType::GPU, float>(ops::IN_OUT_CHANNEL,
{3, 64, 64, 256});
}
TEST(BufferToImageTest, Filter1x1Small) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::CONV2D_FILTER,
TestBidirectionTransform<DeviceType::GPU, float>(ops::CONV2D_FILTER,
{5, 3, 1, 1});
}
TEST(BufferToImageTest, Filter1x1Medium) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::CONV2D_FILTER,
TestBidirectionTransform<DeviceType::GPU, float>(ops::CONV2D_FILTER,
{13, 17, 1, 1});
}
TEST(BufferToImageTest, Filter1x1Large) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::CONV2D_FILTER,
TestBidirectionTransform<DeviceType::GPU, float>(ops::CONV2D_FILTER,
{512, 128, 1, 1});
}
TEST(BufferToImageTest, Filter3x3Small) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::CONV2D_FILTER,
TestBidirectionTransform<DeviceType::GPU, float>(ops::CONV2D_FILTER,
{3, 5, 3, 3});
}
TEST(BufferToImageTest, Filter3x3Medium) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::CONV2D_FILTER,
TestBidirectionTransform<DeviceType::GPU, float>(ops::CONV2D_FILTER,
{17, 13, 3, 3});
}
TEST(BufferToImageTest, Filter3x3Large) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::CONV2D_FILTER,
TestBidirectionTransform<DeviceType::GPU, float>(ops::CONV2D_FILTER,
{256, 128, 3, 3});
}
TEST(BufferToImageTest, WeightWidthSmall) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::WEIGHT_WIDTH,
TestBidirectionTransform<DeviceType::GPU, float>(ops::WEIGHT_WIDTH,
{1, 3, 3, 3});
}
TEST(BufferToImageTest, WeightWidthMedium) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::WEIGHT_WIDTH,
TestBidirectionTransform<DeviceType::GPU, float>(ops::WEIGHT_WIDTH,
{11, 13, 13, 17});
}
TEST(BufferToImageTest, WeightWidthLarge) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::WEIGHT_WIDTH,
TestBidirectionTransform<DeviceType::GPU, float>(ops::WEIGHT_WIDTH,
{64, 64, 11, 13});
}
TEST(BufferToImageTest, WeightHeightSmall) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::WEIGHT_HEIGHT,
TestBidirectionTransform<DeviceType::GPU, float>(ops::WEIGHT_HEIGHT,
{2, 1, 1, 1});
}
TEST(BufferToImageTest, WeightHeightMedium) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::WEIGHT_HEIGHT,
TestBidirectionTransform<DeviceType::GPU, float>(ops::WEIGHT_HEIGHT,
{11, 13, 13, 17});
}
TEST(BufferToImageTest, WeightHeightLarge) {
TestBidirectionTransform<DeviceType::GPU, float>(kernels::WEIGHT_HEIGHT,
TestBidirectionTransform<DeviceType::GPU, float>(ops::WEIGHT_HEIGHT,
{64, 16, 11, 13});
}
......@@ -188,7 +188,7 @@ void TestDiffTypeBidirectionTransform(const int type,
} // namespace
TEST(BufferToImageTest, ArgFloatToHalfSmall) {
TestDiffTypeBidirectionTransform<DeviceType::GPU, half>(kernels::ARGUMENT,
TestDiffTypeBidirectionTransform<DeviceType::GPU, half>(ops::ARGUMENT,
{11});
}
......@@ -233,7 +233,7 @@ TEST(BufferToImageTest, ArgStringHalfToHalfSmall) {
const unsigned char input_data[] = {
0xCD, 0x3C, 0x33, 0x40,
};
TestStringHalfBidirectionTransform<DeviceType::GPU, half>(kernels::ARGUMENT,
TestStringHalfBidirectionTransform<DeviceType::GPU, half>(ops::ARGUMENT,
{2}, input_data);
}
......
......@@ -15,11 +15,11 @@
#include <memory>
#include "mace/core/operator.h"
#include "mace/kernels/opencl/buffer/buffer_transform.h"
#include "mace/kernels/opencl/image/buffer_to_image.h"
#include "mace/ops/opencl/buffer/buffer_transform.h"
#include "mace/ops/opencl/image/buffer_to_image.h"
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class BufferTransformOp;
......@@ -41,9 +41,9 @@ class BufferTransformOp<DeviceType::GPU, T> : public Operation {
const Tensor *input = this->Input(0);
Tensor *output = this->Output(0);
kernels::BufferType type =
static_cast<kernels::BufferType>(Operation::GetOptionalArg<int>(
"buffer_type", static_cast<int>(kernels::CONV2D_FILTER)));
ops::BufferType type =
static_cast<ops::BufferType>(Operation::GetOptionalArg<int>(
"buffer_type", static_cast<int>(ops::CONV2D_FILTER)));
return kernel_->Compute(context, input, type,
wino_blk_size_, output);
......@@ -63,5 +63,5 @@ void RegisterBufferTransform(OpRegistryBase *op_registry) {
BufferTransformOp, DeviceType::GPU, half);
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -69,12 +69,12 @@ void TestBidirectionTransform(const int type,
} // namespace
TEST_F(BufferTransformTest, FloatToHalf) {
TestBidirectionTransform<float, half>(kernels::BufferType::IN_OUT_CHANNEL,
TestBidirectionTransform<float, half>(ops::BufferType::IN_OUT_CHANNEL,
{1, 2, 3, 4});
}
TEST_F(BufferTransformTest, HalfToHalf) {
TestBidirectionTransform<half, half>(kernels::BufferType::IN_OUT_CHANNEL,
TestBidirectionTransform<half, half>(ops::BufferType::IN_OUT_CHANNEL,
{1, 2, 3, 4});
}
......@@ -85,7 +85,7 @@ void TestArgumentTransform(const index_t input_size) {
OpDefBuilder("BufferTransform", "BufferTransformTest")
.Input("Input")
.Output("Output")
.AddIntArg("buffer_type", kernels::BufferType::ARGUMENT)
.AddIntArg("buffer_type", ops::BufferType::ARGUMENT)
.AddIntArg("T", DataTypeToEnum<T>::value)
.Finalize(net.NewOperatorDef());
......
......@@ -15,7 +15,7 @@
#include "mace/core/operator.h"
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, typename SrcType>
class CastOp : public Operation {
......@@ -57,5 +57,5 @@ void RegisterCast(OpRegistryBase *op_registry) {
DeviceType::CPU, int32_t);
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -13,7 +13,6 @@
// limitations under the License.
#include "gmock/gmock.h"
#include "mace/core/op_def_registry.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......
......@@ -16,11 +16,11 @@
#include "mace/core/operator.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/channel_shuffle.h"
#include "mace/ops/opencl/image/channel_shuffle.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class ChannelShuffleOp;
......@@ -115,5 +115,5 @@ void RegisterChannelShuffle(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/ops/ops_test_util.h"
......@@ -44,7 +43,7 @@ void ChannelShuffle(
.Finalize(net.NewOperatorDef());
} else if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("ChannelShuffle", "ChannelShuffleTest")
.Input("InputImage")
......
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -61,7 +60,7 @@ TEST_F(ChannelShuffleOpTest, C16G4_OPENCL) {
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31});
BufferToImage<DeviceType::GPU, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("ChannelShuffle", "ChannelShuffleTest")
.Input("InputImage")
......@@ -74,7 +73,7 @@ TEST_F(ChannelShuffleOpTest, C16G4_OPENCL) {
// Transfer output
ImageToBuffer<DeviceType::GPU, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
// Check
auto expected = net.CreateTensor<float>(
......
......@@ -18,11 +18,11 @@
#include "mace/utils/quantize.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/concat.h"
#include "mace/ops/opencl/image/concat.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
class ConcatOpBase : public Operation {
public:
......@@ -206,6 +206,9 @@ void RegisterConcat(OpRegistryBase *op_registry) {
MACE_REGISTER_OP(op_registry, "Concat", ConcatOp,
DeviceType::CPU, float);
MACE_REGISTER_OP(op_registry, "Concat", ConcatOp,
DeviceType::CPU, int32_t);
MACE_REGISTER_OP(op_registry, "Concat", ConcatOp,
DeviceType::CPU, uint8_t);
......@@ -218,5 +221,5 @@ void RegisterConcat(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/ops/ops_test_util.h"
......@@ -90,9 +89,9 @@ void OpenclConcatHelper(int iters,
net.AddRandomInput<DeviceType::GPU, float>("Input1", shape1);
BufferToImage<DeviceType::GPU, T>(&net, "Input0", "InputImage0",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<DeviceType::GPU, T>(&net, "Input1", "InputImage1",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Concat", "ConcatBM")
.Input("InputImage0")
.Input("InputImage1")
......
......@@ -262,7 +262,7 @@ void OpenclRandomTest(const std::vector<std::vector<index_t>> &shapes,
net.AddInputFromArray<DeviceType::GPU, float>(input_name, shapes[i],
inputs[i]);
BufferToImage<DeviceType::GPU, T>(&net, input_name, image_name,
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
}
auto builder = OpDefBuilder("Concat", "ConcatTest");
......@@ -279,7 +279,7 @@ void OpenclRandomTest(const std::vector<std::vector<index_t>> &shapes,
net.RunOp(DeviceType::GPU);
ImageToBuffer<DeviceType::GPU, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
// Check
auto output = net.GetOutput("Output");
......
......@@ -26,20 +26,20 @@
#include "mace/core/future.h"
#include "mace/core/operator.h"
#include "mace/core/tensor.h"
#include "mace/kernels/activation.h"
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/kernels/arm/conv_winograd.h"
#include "mace/kernels/conv_pool_2d_base.h"
#include "mace/kernels/conv_pool_2d_util.h"
#include "mace/kernels/gemmlowp_util.h"
#include "mace/ops/activation.h"
#include "mace/ops/arm/conv_2d_neon.h"
#include "mace/ops/arm/conv_winograd.h"
#include "mace/ops/conv_pool_2d_base.h"
#include "mace/ops/conv_pool_2d_util.h"
#include "mace/ops/gemmlowp_util.h"
#include "mace/utils/utils.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/conv_2d.h"
#include "mace/kernels/opencl/buffer/conv_2d.h"
#include "mace/ops/opencl/image/conv_2d.h"
#include "mace/ops/opencl/buffer/conv_2d.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class Conv2dOp;
......@@ -49,7 +49,7 @@ class Conv2dOp<DeviceType::CPU, float> : public ConvPool2dOpBase {
public:
explicit Conv2dOp(OpConstructContext *context)
: ConvPool2dOpBase(context),
activation_(kernels::StringToActivationType(
activation_(ops::StringToActivationType(
Operation::GetOptionalArg<std::string>("activation",
"NOOP"))),
relux_max_limit_(Operation::GetOptionalArg<float>("max_limit", 0.0f)),
......@@ -712,7 +712,7 @@ class Conv2dOp<DeviceType::CPU, uint8_t> : public ConvPool2dOpBase {
public:
explicit Conv2dOp(OpConstructContext *context)
: ConvPool2dOpBase(context),
activation_(kernels::StringToActivationType(
activation_(ops::StringToActivationType(
Operation::GetOptionalArg<std::string>("activation",
"NOOP"))),
relux_max_limit_(Operation::GetOptionalArg<float>("max_limit", 0.0f)) {}
......@@ -950,7 +950,7 @@ class Conv2dOp<DeviceType::GPU, T> : public ConvPool2dOpBase {
public:
explicit Conv2dOp(OpConstructContext *context)
: ConvPool2dOpBase(context),
activation_(kernels::StringToActivationType(
activation_(ops::StringToActivationType(
Operation::GetOptionalArg<std::string>("activation",
"NOOP"))),
relux_max_limit_(Operation::GetOptionalArg<float>("max_limit", 0.0f)) {
......@@ -999,5 +999,5 @@ void RegisterConv2D(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -14,9 +14,8 @@
#include <algorithm>
#include "mace/core/op_def_registry.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/kernels/conv_pool_2d_util.h"
#include "mace/ops/conv_pool_2d_util.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -66,11 +65,11 @@ void Conv2d(int iters,
.Finalize(net.NewOperatorDef());
} else if (D == DeviceType::GPU) {
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Conv2D", "Conv2dTest")
.Input("InputImage")
.Input("FilterImage")
......
......@@ -15,7 +15,7 @@
#include <fstream>
#include <vector>
#include "mace/kernels/conv_pool_2d_util.h"
#include "mace/ops/conv_pool_2d_util.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -61,11 +61,11 @@ void TestNHWCSimple3x3VALID() {
"Output", NHWC);
} else if (D == DeviceType::GPU) {
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Conv2D", "Conv2dTest")
.Input("InputImage")
.Input("FilterImage")
......@@ -81,7 +81,7 @@ void TestNHWCSimple3x3VALID() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
MACE_NOT_IMPLEMENTED;
......@@ -127,11 +127,11 @@ void TestNHWCSimple3x3SAME() {
"Output", NHWC);
} else if (D == DeviceType::GPU) {
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Conv2D", "Conv2dTest")
.Input("InputImage")
.Input("FilterImage")
......@@ -147,7 +147,7 @@ void TestNHWCSimple3x3SAME() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
MACE_NOT_IMPLEMENTED;
......@@ -213,9 +213,9 @@ void TestNHWCSimple3x3WithoutBias() {
"Output", NHWC);
} else if (D == DeviceType::GPU) {
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
OpDefBuilder("Conv2D", "Conv2dTest")
.Input("InputImage")
......@@ -230,7 +230,7 @@ void TestNHWCSimple3x3WithoutBias() {
net.RunOp(D);
// Transfer output
ImageToBuffer<D, T>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
MACE_NOT_IMPLEMENTED;
}
......@@ -287,11 +287,11 @@ void TestNHWCCombined3x3() {
"Output", NHWC);
} else if (D == DeviceType::GPU) {
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Conv2D", "Conv2DTest")
.Input("InputImage")
......@@ -307,7 +307,7 @@ void TestNHWCCombined3x3() {
net.RunOp(D);
ImageToBuffer<D, T>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
MACE_NOT_IMPLEMENTED;
}
......@@ -362,11 +362,11 @@ void TestFusedNHWCSimple3x3VALID() {
"Output", NHWC);
} else if (D == DeviceType::GPU) {
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Conv2D", "Conv2DTest")
.Input("InputImage")
.Input("FilterImage")
......@@ -383,7 +383,7 @@ void TestFusedNHWCSimple3x3VALID() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
MACE_NOT_IMPLEMENTED;
......@@ -425,9 +425,9 @@ void TestFusedNHWCSimple3x3WithoutBias() {
"Output", NHWC);
} else if (D == DeviceType::GPU) {
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
OpDefBuilder("Conv2D", "Conv2DTest")
.Input("InputImage")
......@@ -443,7 +443,7 @@ void TestFusedNHWCSimple3x3WithoutBias() {
net.RunOp(D);
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
MACE_NOT_IMPLEMENTED;
}
......@@ -505,11 +505,11 @@ void TestConv1x1() {
"Output", NHWC);
} else if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, float>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, float>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Conv2D", "Conv2DTest")
.Input("InputImage")
......@@ -524,7 +524,7 @@ void TestConv1x1() {
net.RunOp(D);
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
MACE_NOT_IMPLEMENTED;
}
......@@ -596,11 +596,11 @@ void TestComplexConvNxNS12(const std::vector<index_t> &shape,
// run on gpu
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Conv2D", "Conv2dTest")
.Input("InputImage")
......@@ -616,7 +616,7 @@ void TestComplexConvNxNS12(const std::vector<index_t> &shape,
net.RunOp(D);
ImageToBuffer<D, T>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"), 1e-4,
1e-4);
};
......@@ -705,11 +705,11 @@ void TestHalfComplexConvNxNS12(const std::vector<index_t> &input_shape,
// run on gpu
BufferToImage<D, half>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, half>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, half>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Conv2D", "Conv2dTest")
.Input("InputImage")
......@@ -725,7 +725,7 @@ void TestHalfComplexConvNxNS12(const std::vector<index_t> &input_shape,
net.RunOp(D);
ImageToBuffer<D, float>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"), 1e-2,
1e-2);
......@@ -857,11 +857,11 @@ void TestDilationConvNxN(const std::vector<index_t> &shape,
// run on gpu
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Conv2D", "Conv2dTest")
.Input("InputImage")
......@@ -877,7 +877,7 @@ void TestDilationConvNxN(const std::vector<index_t> &shape,
net.RunOp(D);
ImageToBuffer<D, T>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"), 1e-4,
1e-4);
};
......@@ -954,11 +954,11 @@ void TestGeneralHalfAtrousConv(const std::vector<index_t> &image_shape,
// run on gpu
BufferToImage<D, half>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, half>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, half>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Conv2D", "Conv2dTest")
.Input("InputImage")
......@@ -974,7 +974,7 @@ void TestGeneralHalfAtrousConv(const std::vector<index_t> &image_shape,
net.RunOp(D);
ImageToBuffer<D, float>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"), 1e-2,
1e-1);
};
......@@ -1041,11 +1041,11 @@ void TestArbitraryPadConvNxN(const std::vector<index_t> &shape,
// run on gpu
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Conv2D", "Conv2dTest")
.Input("InputImage")
......@@ -1060,7 +1060,7 @@ void TestArbitraryPadConvNxN(const std::vector<index_t> &shape,
net.RunOp(D);
ImageToBuffer<D, T>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"), 1e-4,
1e-4);
};
......
......@@ -12,16 +12,16 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_KERNELS_CONV_POOL_2D_BASE_H_
#define MACE_KERNELS_CONV_POOL_2D_BASE_H_
#ifndef MACE_OPS_CONV_POOL_2D_BASE_H_
#define MACE_OPS_CONV_POOL_2D_BASE_H_
#include <vector>
#include "mace/core/operator.h"
#include "mace/kernels/conv_pool_2d_util.h"
#include "mace/ops/conv_pool_2d_util.h"
namespace mace {
namespace kernels {
namespace ops {
class ConvPool2dOpBase : public Operation {
public:
......@@ -40,7 +40,7 @@ class ConvPool2dOpBase : public Operation {
std::vector<int> dilations_;
};
} // namespace kernels
} // namespace ops
} // namespace mace
#endif // MACE_KERNELS_CONV_POOL_2D_BASE_H_
#endif // MACE_OPS_CONV_POOL_2D_BASE_H_
......@@ -12,14 +12,14 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/kernels/conv_pool_2d_util.h"
#include "mace/ops/conv_pool_2d_util.h"
#include <algorithm>
#include <cmath>
#include <vector>
namespace mace {
namespace kernels {
namespace ops {
void CalcPaddingAndOutputSize(const index_t *input_shape,
const DataFormat input_format,
......@@ -463,5 +463,5 @@ MaceStatus ConstructNHWCInputWithPadding(const Tensor *input_tensor,
return MaceStatus::MACE_SUCCESS;
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,8 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_KERNELS_CONV_POOL_2D_UTIL_H_
#define MACE_KERNELS_CONV_POOL_2D_UTIL_H_
#ifndef MACE_OPS_CONV_POOL_2D_UTIL_H_
#define MACE_OPS_CONV_POOL_2D_UTIL_H_
#include "mace/core/tensor.h"
......@@ -30,7 +30,7 @@ enum RoundType {
CEIL = 1,
};
namespace kernels {
namespace ops {
void CalcPaddingAndOutputSize(const index_t *input_shape,
const DataFormat input_format,
......@@ -113,7 +113,7 @@ MaceStatus ConstructNHWCInputWithPadding(const Tensor *input,
Tensor *output_tensor,
bool padding_same_value = false);
} // namespace kernels
} // namespace ops
} // namespace mace
#endif // MACE_KERNELS_CONV_POOL_2D_UTIL_H_
#endif // MACE_OPS_CONV_POOL_2D_UTIL_H_
......@@ -29,7 +29,7 @@ TEST(CoreTest, INIT_MODE) {
OpDefBuilder("BufferTransform", "BufferTransformTest")
.Input("Input")
.Output("B2IOutput")
.AddIntArg("buffer_type", kernels::BufferType::CONV2D_FILTER)
.AddIntArg("buffer_type", ops::BufferType::CONV2D_FILTER)
.AddIntArg("mode", static_cast<int>(NetMode::INIT))
.Finalize(&op_defs[op_defs.size() - 1]);
......@@ -46,18 +46,16 @@ TEST(CoreTest, INIT_MODE) {
OpDefBuilder("BufferInverseTransform", "BufferInverseTransformTest")
.Input("B2IOutput")
.Output("Output")
.AddIntArg("buffer_type", kernels::BufferType::CONV2D_FILTER)
.AddIntArg("buffer_type", ops::BufferType::CONV2D_FILTER)
.Finalize(&op_defs[op_defs.size() - 1]);
NetDef net_def;
for (auto &op_def : op_defs) {
net_def.add_op()->CopyFrom(op_def);
net_def.add_op_types(op_def.type());
}
std::shared_ptr<OpDefRegistryBase> op_def_registry(new OpDefRegistry());
std::shared_ptr<OpRegistryBase> op_registry(new OpRegistry());
std::shared_ptr<OpRegistry> op_registry(new OpRegistry());
auto net = std::unique_ptr<NetBase>(new SerialNet(
op_def_registry.get(), op_registry.get(), &net_def, &ws, device,
op_registry.get(), &net_def, &ws, device,
NetMode::INIT));
MaceStatus status = net->Init();
MACE_CHECK(status == MaceStatus::MACE_SUCCESS);
......@@ -67,7 +65,7 @@ TEST(CoreTest, INIT_MODE) {
EXPECT_TRUE(ws.GetTensor("B2IOutput") != nullptr);
EXPECT_TRUE(ws.GetTensor("Output") == nullptr);
net = std::unique_ptr<NetBase>(new SerialNet(
op_def_registry.get(), op_registry.get(), &net_def, &ws, device));
op_registry.get(), &net_def, &ws, device));
status = net->Init();
MACE_CHECK(status == MaceStatus::MACE_SUCCESS);
status = net->Run();
......
......@@ -16,11 +16,11 @@
#include "mace/core/operator.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/crop.h"
#include "mace/ops/opencl/image/crop.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class CropOp : public Operation {
......@@ -143,5 +143,5 @@ void RegisterCrop(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/ops/ops_test_util.h"
......@@ -81,9 +80,9 @@ void OpenclCropHelper(int iters,
net.AddRandomInput<DeviceType::GPU, float>("Input1", shape1);
BufferToImage<DeviceType::GPU, T>(&net, "Input0", "InputImage0",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<DeviceType::GPU, T>(&net, "Input1", "InputImage1",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Crop", "CropBM")
.Input("InputImage0")
.Input("InputImage1")
......
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -36,9 +35,9 @@ void RunCrop(const std::vector<index_t> &input_shape,
if (D == GPU) {
BufferToImage<D, float>(&net, "Input0", "InputImage0",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, float>(&net, "Input1", "InputImage1",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Crop", "CropTest")
.Input("InputImage0")
.Input("InputImage1")
......@@ -69,7 +68,7 @@ void RunCrop(const std::vector<index_t> &input_shape,
if (D == GPU) {
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else if (D == CPU) {
net.TransformDataFormat<DeviceType::CPU, float>("OutputNCHW", NCHW,
"Output", NHWC);
......
......@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/kernels/deconv_2d.h"
#include "mace/ops/deconv_2d.h"
#if defined(MACE_ENABLE_NEON)
#include <arm_neon.h>
......@@ -27,16 +27,16 @@
#include "mace/core/future.h"
#include "mace/core/operator.h"
#include "mace/core/tensor.h"
#include "mace/kernels/activation.h"
#include "mace/kernels/arm/deconv_2d_neon.h"
#include "mace/kernels/conv_pool_2d_util.h"
#include "mace/ops/activation.h"
#include "mace/ops/arm/deconv_2d_neon.h"
#include "mace/ops/conv_pool_2d_util.h"
#include "mace/utils/utils.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/deconv_2d.h"
#include "mace/ops/opencl/image/deconv_2d.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
class Deconv2dOpBase : public Operation {
public:
......@@ -46,9 +46,9 @@ class Deconv2dOpBase : public Operation {
padding_type_(static_cast<Padding>(Operation::GetOptionalArg<int>(
"padding", static_cast<int>(SAME)))),
paddings_(Operation::GetRepeatedArgs<int>("padding_values")),
model_type_(static_cast<kernels::FrameworkType>(
model_type_(static_cast<ops::FrameworkType>(
Operation::GetOptionalArg<int>("framework_type", 0))),
activation_(kernels::StringToActivationType(
activation_(ops::StringToActivationType(
Operation::GetOptionalArg<std::string>("activation",
"NOOP"))),
relux_max_limit_(Operation::GetOptionalArg<float>("max_limit", 0.0f)) {}
......@@ -180,7 +180,7 @@ class Deconv2dOp<DeviceType::CPU, float> : public Deconv2dOpBase {
const Tensor *filter = this->Input(1);
const Tensor *bias = nullptr;
const Tensor *output_shape_tensor = nullptr;
if (model_type_ == kernels::CAFFE) {
if (model_type_ == ops::CAFFE) {
bias = this->InputSize() >= 3 ? this->Input(2) : nullptr;
} else {
output_shape_tensor =
......@@ -491,7 +491,7 @@ class Deconv2dOp<DeviceType::GPU, T> : public Deconv2dOpBase {
const Tensor *filter = this->Input(1);
const Tensor *bias = nullptr;
const Tensor *output_shape_tensor = nullptr;
if (model_type_ == kernels::CAFFE) {
if (model_type_ == ops::CAFFE) {
bias = this->InputSize() >= 3 ? this->Input(2) : nullptr;
} else {
output_shape_tensor =
......@@ -557,5 +557,5 @@ void RegisterDeconv2D(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,18 +12,18 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_KERNELS_DECONV_2D_H_
#define MACE_KERNELS_DECONV_2D_H_
#ifndef MACE_OPS_DECONV_2D_H_
#define MACE_OPS_DECONV_2D_H_
namespace mace {
namespace kernels {
namespace ops {
enum FrameworkType {
TENSORFLOW = 0,
CAFFE = 1,
};
} // namespace kernels
} // namespace ops
} // namespace mace
#endif // MACE_KERNELS_DECONV_2D_H_
#endif // MACE_OPS_DECONV_2D_H_
......@@ -14,9 +14,8 @@
#include <algorithm>
#include "mace/core/op_def_registry.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/kernels/conv_pool_2d_util.h"
#include "mace/ops/conv_pool_2d_util.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -54,11 +53,11 @@ static void Deconv2d(int iters,
{batch, out_h, out_w, output_channels});
if (D == DeviceType::GPU) {
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("Deconv2D", "Deconv2dTest")
.Input("InputImage")
.Input("FilterImage")
......
......@@ -15,8 +15,8 @@
#include <fstream>
#include <vector>
#include "mace/kernels/deconv_2d.h"
#include "mace/kernels/conv_pool_2d_util.h"
#include "mace/ops/deconv_2d.h"
#include "mace/ops/conv_pool_2d_util.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -38,7 +38,7 @@ void RunTestSimple(const std::vector<index_t> &input_shape,
const std::vector<float> &filter_data,
const std::vector<index_t> &expected_shape,
const std::vector<float> &expected_data,
kernels::FrameworkType model_type) {
ops::FrameworkType model_type) {
OpsTestNet net;
// Add input data
const index_t batch = input_shape[0];
......@@ -50,12 +50,12 @@ void RunTestSimple(const std::vector<index_t> &input_shape,
net.TransformDataFormat<D, float>("Filter", HWOI, "FilterOIHW", OIHW);
if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, float>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
BufferToImage<D, float>(&net, "FilterOIHW", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
if (model_type == kernels::FrameworkType::CAFFE) {
ops::BufferType::CONV2D_FILTER);
if (model_type == ops::FrameworkType::CAFFE) {
OpDefBuilder("Deconv2D", "Deconv2dTest")
.Input("InputImage")
.Input("FilterImage")
......@@ -85,12 +85,12 @@ void RunTestSimple(const std::vector<index_t> &input_shape,
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
net.TransformDataFormat<DeviceType::CPU, float>("Input", NHWC, "InputNCHW",
NCHW);
if (model_type == kernels::FrameworkType::CAFFE) {
if (model_type == ops::FrameworkType::CAFFE) {
OpDefBuilder("Deconv2D", "Deconv2dTest")
.Input("InputNCHW")
.Input("FilterOIHW")
......@@ -138,7 +138,7 @@ void TestNHWCSimple3x3SAME_S1() {
{4.5, 4.6, 4.7, 6.5, 6.6, 6.7, 4.5, 4.6, 4.7,
6.5, 6.6, 6.7, 9.5, 9.6, 9.7, 6.5, 6.6, 6.7,
4.5, 4.6, 4.7, 6.5, 6.6, 6.7, 4.5, 4.6, 4.7},
kernels::FrameworkType::TENSORFLOW);
ops::FrameworkType::TENSORFLOW);
RunTestSimple<D>({1, 3, 3, 1}, {1, 1, 1, 1, 1, 1, 1, 1, 1}, {0, 0, 0},
1, Padding::VALID, {2, 2},
{0}, {3, 3, 3, 1},
......@@ -147,7 +147,7 @@ void TestNHWCSimple3x3SAME_S1() {
{1, 3, 3, 3},
{4, 4, 4, 6, 6, 6, 4, 4, 4, 6, 6, 6, 9, 9,
9, 6, 6, 6, 4, 4, 4, 6, 6, 6, 4, 4, 4},
kernels::FrameworkType::CAFFE);
ops::FrameworkType::CAFFE);
RunTestSimple<D>({1, 3, 3, 1}, {1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 0, 0},
1, Padding::SAME, {},
{1, 3, 3, 3}, {3, 3, 3, 1},
......@@ -157,7 +157,7 @@ void TestNHWCSimple3x3SAME_S1() {
{54, 66, 78, 126, 147, 168, 130, 146, 162,
198, 225, 252, 405, 450, 495, 366, 399, 432,
354, 378, 402, 630, 669, 708, 502, 530, 558},
kernels::FrameworkType::TENSORFLOW);
ops::FrameworkType::TENSORFLOW);
RunTestSimple<D>({1, 3, 3, 1}, {1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 0, 0},
1, Padding::SAME, {2, 2},
{0}, {3, 3, 3, 1},
......@@ -167,7 +167,7 @@ void TestNHWCSimple3x3SAME_S1() {
{54, 66, 78, 126, 147, 168, 130, 146, 162,
198, 225, 252, 405, 450, 495, 366, 399, 432,
354, 378, 402, 630, 669, 708, 502, 530, 558},
kernels::FrameworkType::CAFFE);
ops::FrameworkType::CAFFE);
}
template <DeviceType D>
......@@ -185,7 +185,7 @@ void TestNHWCSimple3x3SAME_S2() {
1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1,
2, 2, 2, 2, 2, 2, 4, 4, 4, 2, 2, 2, 4, 4, 4, 2, 2, 2,
1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1},
kernels::FrameworkType::TENSORFLOW);
ops::FrameworkType::TENSORFLOW);
RunTestSimple<D>({1, 3, 3, 1}, {1, 1, 1, 1, 1, 1, 1, 1, 1}, {0, 0, 0},
2, Padding::SAME, {2, 2},
{0}, {3, 3, 3, 1},
......@@ -198,7 +198,7 @@ void TestNHWCSimple3x3SAME_S2() {
1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1,
2, 2, 2, 4, 4, 4, 2, 2, 2, 4, 4, 4, 2, 2, 2,
1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1},
kernels::FrameworkType::CAFFE);
ops::FrameworkType::CAFFE);
RunTestSimple<D>({1, 3, 3, 1}, {1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 0, 0},
2, Padding::SAME, {},
{1, 6, 6, 3}, {3, 3, 3, 1},
......@@ -216,7 +216,7 @@ void TestNHWCSimple3x3SAME_S2() {
83, 94, 105, 116, 127, 138, 252, 276, 300, 142, 155, 168,
304, 332, 360, 168, 183, 198, 70, 77, 84, 91, 98, 105, 192,
207, 222, 104, 112, 120, 218, 235, 252, 117, 126, 135},
kernels::FrameworkType::TENSORFLOW);
ops::FrameworkType::TENSORFLOW);
RunTestSimple<D>({1, 3, 3, 1}, {1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 0, 0},
2, Padding::SAME, {2, 2},
{0}, {3, 3, 3, 1},
......@@ -229,7 +229,7 @@ void TestNHWCSimple3x3SAME_S2() {
140, 151, 162, 78, 84, 90, 116, 127, 138, 252, 276, 300,
142, 155, 168, 304, 332, 360, 168, 183, 198, 91, 98, 105,
192, 207, 222, 104, 112, 120, 218, 235, 252, 117, 126, 135},
kernels::FrameworkType::CAFFE);
ops::FrameworkType::CAFFE);
}
template <DeviceType D>
......@@ -246,7 +246,7 @@ void TestNHWCSimple3x3SAME_S2_1() {
18, 18, 18, 45, 45, 45, 27, 27, 27, 45, 45, 45, 18, 18, 18,
30, 30, 30, 75, 75, 75, 45, 45, 45, 75, 75, 75, 30, 30, 30,
12, 12, 12, 30, 30, 30, 18, 18, 18, 30, 30, 30, 12, 12, 12},
kernels::FrameworkType::TENSORFLOW);
ops::FrameworkType::TENSORFLOW);
}
template <DeviceType D>
......@@ -271,7 +271,7 @@ void TestNHWCSimple3x3VALID_S2() {
1, 1, 1,
1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1,
1, 1, 1},
kernels::FrameworkType::TENSORFLOW);
ops::FrameworkType::TENSORFLOW);
}
template <DeviceType D>
......@@ -288,7 +288,7 @@ void TestNHWCSimple3x3VALID_S1() {
366, 399, 432, 234, 252, 270, 146, 157, 168, 354, 378, 402,
630, 669, 708, 502, 530, 558, 294, 309, 324, 133, 140, 147,
306, 321, 336, 522, 546, 570, 398, 415, 432, 225, 234, 243},
kernels::FrameworkType::TENSORFLOW);
ops::FrameworkType::TENSORFLOW);
}
template <DeviceType D>
......@@ -297,7 +297,7 @@ void TestNHWCSimple2x2SAME() {
{1, 2, 2, 1}, {3, 3, 1, 1},
{1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f},
{1, 2, 2, 1}, {4.f, 4.f, 4.f, 4.f},
kernels::FrameworkType::TENSORFLOW);
ops::FrameworkType::TENSORFLOW);
}
template <DeviceType D>
......@@ -308,7 +308,7 @@ void TestNHWCSimple2x2VALID() {
{1, 5, 5, 1},
{1.f, 1.f, 2.f, 1.f, 1.f, 1.f, 1.f, 2.f, 1.f, 1.f, 2.f, 2.f, 4.f,
2.f, 2.f, 1.f, 1.f, 2.f, 1.f, 1.f, 1.f, 1.f, 2.f, 1.f, 1.f},
kernels::FrameworkType::TENSORFLOW);
ops::FrameworkType::TENSORFLOW);
}
} // namespace
......@@ -397,11 +397,11 @@ void TestComplexDeconvNxNS12(const int batch,
std::vector<int> paddings;
std::vector<int> output_shape;
kernels::FrameworkType model_type =
ops::FrameworkType model_type =
padding < 0 ?
kernels::FrameworkType::TENSORFLOW : kernels::FrameworkType::CAFFE;
ops::FrameworkType::TENSORFLOW : ops::FrameworkType::CAFFE;
if (model_type == kernels::FrameworkType::TENSORFLOW) {
if (model_type == ops::FrameworkType::TENSORFLOW) {
if (type == Padding::SAME) {
out_h = (height - 1) * stride_h + 1;
out_w = (width - 1) * stride_w + 1;
......@@ -421,7 +421,7 @@ void TestComplexDeconvNxNS12(const int batch,
paddings.push_back(padding);
}
if (model_type == kernels::FrameworkType::CAFFE) {
if (model_type == ops::FrameworkType::CAFFE) {
OpDefBuilder("Deconv2D", "Deconv2dTest")
.Input("InputNCHW")
.Input("Filter")
......@@ -458,13 +458,13 @@ void TestComplexDeconvNxNS12(const int batch,
// run on gpu
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::CONV2D_FILTER);
ops::BufferType::CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
if (model_type == kernels::FrameworkType::CAFFE) {
if (model_type == ops::FrameworkType::CAFFE) {
OpDefBuilder("Deconv2D", "Deconv2dTest")
.Input("InputImage")
.Input("FilterImage")
......@@ -492,7 +492,7 @@ void TestComplexDeconvNxNS12(const int batch,
net.RunOp(D);
ImageToBuffer<D, T>(&net, "OutputImage", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
ExpectTensorNear<float>(*expected, *net.GetOutput("OPENCLOutput"), 1e-4,
1e-4);
};
......
......@@ -17,11 +17,11 @@
#include "mace/core/operator.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/depth_to_space.h"
#include "mace/ops/opencl/image/depth_to_space.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class DepthToSpaceOp : public Operation {
......@@ -127,5 +127,5 @@ void RegisterDepthToSpace(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/ops/ops_test_util.h"
......@@ -44,7 +43,7 @@ void DepthToSpace(
.Finalize(net.NewOperatorDef());
} else if (D == DeviceType::GPU) {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("DepthToSpace", "DepthToSpaceBM")
.Input("InputImage")
......
......@@ -15,7 +15,6 @@
#include <fstream>
#include <vector>
#include "mace/core/op_def_registry.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -47,7 +46,7 @@ void RunDepthToSpace(const std::vector<index_t> &input_shape,
} else {
BufferToImage<D, float>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("DepthToSpace", "DepthToSpaceTest")
.Input("InputImage")
.Output("OutputImage")
......@@ -59,7 +58,7 @@ void RunDepthToSpace(const std::vector<index_t> &input_shape,
if (D == DeviceType::GPU) {
ImageToBuffer<DeviceType::GPU, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
}
auto expected = net.CreateTensor<float>(expected_shape, expected_data);
ExpectTensorNear<float>(*expected, *net.GetOutput("Output"), 1e-5);
......@@ -136,7 +135,7 @@ void RandomTest(const int block_size,
NHWC);
BufferToImage<D, T>(&net, "Input", "InputImg",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("DepthToSpace", "DepthToSpaceTest")
.Input("InputImg")
......@@ -149,7 +148,7 @@ void RandomTest(const int block_size,
net.RunOp(D);
ImageToBuffer<D, float>(&net, "OutputImg", "OPENCLOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
if (DataTypeToEnum<T>::value == DT_FLOAT) {
ExpectTensorNear<float>(*net.GetTensor("Output"),
......
......@@ -26,24 +26,24 @@
#include "mace/core/future.h"
#include "mace/core/operator.h"
#include "mace/kernels/activation.h"
#include "mace/kernels/arm/depthwise_conv2d_neon.h"
#include "mace/kernels/conv_pool_2d_base.h"
#include "mace/ops/activation.h"
#include "mace/ops/arm/depthwise_conv2d_neon.h"
#include "mace/ops/conv_pool_2d_base.h"
#include "mace/public/mace.h"
#include "mace/utils/quantize.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/depthwise_conv2d.h"
#include "mace/kernels/opencl/buffer/depthwise_conv2d.h"
#include "mace/ops/opencl/image/depthwise_conv2d.h"
#include "mace/ops/opencl/buffer/depthwise_conv2d.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
class DepthwiseConv2dOpBase : public ConvPool2dOpBase {
public:
explicit DepthwiseConv2dOpBase(OpConstructContext *context)
: ConvPool2dOpBase(context),
activation_(kernels::StringToActivationType(
activation_(ops::StringToActivationType(
Operation::GetOptionalArg<std::string>("activation",
"NOOP"))),
relux_max_limit_(Operation::GetOptionalArg<float>("max_limit", 0.0f)) {}
......@@ -532,5 +532,5 @@ void RegisterDepthwiseConv2d(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -14,9 +14,8 @@
#include <algorithm>
#include "mace/core/op_def_registry.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/kernels/conv_pool_2d_util.h"
#include "mace/ops/conv_pool_2d_util.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -81,11 +80,11 @@ void DepthwiseConv2d(int iters,
.Finalize(net.NewOperatorDef());
} else if (D == DeviceType::GPU) {
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::DW_CONV2D_FILTER);
ops::BufferType::DW_CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("DepthwiseConv2d", "DepthwiseConv2dTest")
.Input("InputImage")
.Input("FilterImage")
......
......@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/kernels/conv_pool_2d_util.h"
#include "mace/ops/conv_pool_2d_util.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -53,11 +53,11 @@ void SimpleValidTest() {
"Output", NHWC);
} else if (D == DeviceType::GPU) {
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::DW_CONV2D_FILTER);
ops::BufferType::DW_CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("DepthwiseConv2d", "DepthwiseConv2DTest")
.Input("InputImage")
.Input("FilterImage")
......@@ -73,7 +73,7 @@ void SimpleValidTest() {
// Transfer output
ImageToBuffer<D, float>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
MACE_NOT_IMPLEMENTED;
......@@ -150,11 +150,11 @@ void ComplexValidTest(index_t batch,
"Output", NHWC);
} else if (D == DeviceType::GPU) {
BufferToImage<D, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, T>(&net, "Filter", "FilterImage",
kernels::BufferType::DW_CONV2D_FILTER);
ops::BufferType::DW_CONV2D_FILTER);
BufferToImage<D, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("DepthwiseConv2d", "DepthwiseConv2DTest")
.Input("InputImage")
.Input("FilterImage")
......@@ -170,7 +170,7 @@ void ComplexValidTest(index_t batch,
// Transfer output
ImageToBuffer<D, T>(&net, "OutputImage", "Output",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
} else {
MACE_NOT_IMPLEMENTED;
......@@ -290,11 +290,11 @@ void TestNxNS12(const index_t height, const index_t width) {
expected->Copy(*net.GetOutput("Output"));
BufferToImage<DeviceType::GPU, T>(&net, "Input", "InputImage",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<DeviceType::GPU, T>(&net, "Filter", "FilterImage",
kernels::BufferType::DW_CONV2D_FILTER);
ops::BufferType::DW_CONV2D_FILTER);
BufferToImage<DeviceType::GPU, T>(&net, "Bias", "BiasImage",
kernels::BufferType::ARGUMENT);
ops::BufferType::ARGUMENT);
OpDefBuilder("DepthwiseConv2d", "DepthwiseConv2DTest")
.Input("InputImage")
.Input("FilterImage")
......@@ -312,7 +312,7 @@ void TestNxNS12(const index_t height, const index_t width) {
// Transfer output
ImageToBuffer<DeviceType::GPU, float>(&net, "OutputImage", "DeviceOutput",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
// Check
if (DataTypeToEnum<T>::value == DT_FLOAT) {
......
......@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/kernels/eltwise.h"
#include "mace/ops/eltwise.h"
#include <algorithm>
#include <cmath>
......@@ -26,11 +26,11 @@
#include "mace/core/tensor.h"
#include "mace/utils/quantize.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/eltwise.h"
#include "mace/ops/opencl/image/eltwise.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
inline index_t GetIndex(const std::vector<index_t> &shape,
......@@ -792,8 +792,8 @@ class EltwiseOp : public Operation {
public:
explicit EltwiseOp(OpConstructContext *context)
: Operation(context),
type_(static_cast<kernels::EltwiseType>(Operation::GetOptionalArg<int>(
"type", static_cast<int>(kernels::EltwiseType::NONE)))),
type_(static_cast<ops::EltwiseType>(Operation::GetOptionalArg<int>(
"type", static_cast<int>(ops::EltwiseType::NONE)))),
coeff_(Operation::GetRepeatedArgs<float>("coeff")),
scalar_input_(Operation::GetOptionalArg<float>("scalar_input", 1.0)),
scalar_input_index_(Operation::GetOptionalArg<int32_t>(
......@@ -934,8 +934,8 @@ class EltwiseOp<DeviceType::CPU, uint8_t> : public Operation {
public:
explicit EltwiseOp(OpConstructContext *context)
: Operation(context),
type_(static_cast<kernels::EltwiseType>(Operation::GetOptionalArg<int>(
"type", static_cast<int>(kernels::EltwiseType::NONE)))),
type_(static_cast<ops::EltwiseType>(Operation::GetOptionalArg<int>(
"type", static_cast<int>(ops::EltwiseType::NONE)))),
coeff_(Operation::GetRepeatedArgs<float>("coeff")),
scalar_input_(Operation::GetOptionalArg<float>("scalar_input", 1.0)),
scalar_input_index_(Operation::GetOptionalArg<int32_t>(
......@@ -1076,9 +1076,9 @@ class EltwiseOp<DeviceType::GPU, T> : public Operation {
public:
explicit EltwiseOp(OpConstructContext *context)
: Operation(context) {
EltwiseType type = static_cast<kernels::EltwiseType>(
EltwiseType type = static_cast<ops::EltwiseType>(
Operation::GetOptionalArg<int>(
"type", static_cast<int>(kernels::EltwiseType::NONE)));
"type", static_cast<int>(ops::EltwiseType::NONE)));
std::vector<float> coeff = Operation::GetRepeatedArgs<float>("coeff");
float scalar_input = Operation::GetOptionalArg<float>("scalar_input", 1.0);
int32_t scalar_input_index = Operation::GetOptionalArg<int32_t>(
......@@ -1121,5 +1121,5 @@ void RegisterEltwise(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,11 +12,11 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_KERNELS_ELTWISE_H_
#define MACE_KERNELS_ELTWISE_H_
#ifndef MACE_OPS_ELTWISE_H_
#define MACE_OPS_ELTWISE_H_
namespace mace {
namespace kernels {
namespace ops {
enum EltwiseType {
SUM = 0,
......@@ -35,7 +35,7 @@ enum EltwiseType {
inline bool IsLogicalType(EltwiseType type) { return type == EQUAL; }
} // namespace kernels
} // namespace ops
} // namespace mace
#endif // MACE_KERNELS_ELTWISE_H_
#endif // MACE_OPS_ELTWISE_H_
......@@ -14,9 +14,8 @@
#include <string>
#include "mace/core/op_def_registry.h"
#include "mace/core/testing/test_benchmark.h"
#include "mace/kernels/eltwise.h"
#include "mace/ops/eltwise.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......@@ -26,7 +25,7 @@ namespace test {
namespace {
template <DeviceType D, typename T>
void EltwiseBenchmark(
int iters, kernels::EltwiseType type, int n, int h, int w, int c) {
int iters, ops::EltwiseType type, int n, int h, int w, int c) {
mace::testing::StopTiming();
OpsTestNet net;
......@@ -36,9 +35,9 @@ void EltwiseBenchmark(
if (D == DeviceType::GPU) {
BufferToImage<D, half>(&net, "Input0", "InputImg0",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
BufferToImage<D, half>(&net, "Input1", "InputImg1",
kernels::BufferType::IN_OUT_CHANNEL);
ops::BufferType::IN_OUT_CHANNEL);
OpDefBuilder("Eltwise", "EltwiseTest")
.Input("InputImg0")
.Input("InputImg1")
......@@ -84,7 +83,7 @@ void EltwiseBenchmark(
mace::testing::MaccProcessed(tot); \
mace::testing::BytesProcessed(tot *(sizeof(TYPE))); \
EltwiseBenchmark<DEVICE, TYPE>( \
iters, static_cast<kernels::EltwiseType>(ELT_TYPE), N, H, W, C); \
iters, static_cast<ops::EltwiseType>(ELT_TYPE), N, H, W, C); \
} \
MACE_BENCHMARK( \
MACE_BM_ELTWISE_##ELT_TYPE##_##N##_##H##_##W##_##C##_##TYPE##_##DEVICE)
......
此差异已折叠。
......@@ -16,7 +16,7 @@
#include "mace/core/operator.h"
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class ExpandDimsOp;
......@@ -67,5 +67,5 @@ void RegisterExpandDims(OpRegistryBase *op_registry) {
DeviceType::CPU, uint8_t);
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -13,7 +13,6 @@
// limitations under the License.
#include "gmock/gmock.h"
#include "mace/core/op_def_registry.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......
......@@ -16,7 +16,7 @@
#include "mace/core/operator.h"
namespace mace {
namespace kernels {
namespace ops {
template <DeviceType D, class T>
class FillOp;
......@@ -66,5 +66,5 @@ void RegisterFill(OpRegistryBase *op_registry) {
DeviceType::CPU, float);
}
} // namespace kernels
} // namespace ops
} // namespace mace
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/core/op_def_registry.h"
#include "mace/ops/ops_test_util.h"
namespace mace {
......
......@@ -12,8 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_KERNELS_FIXPOINT_H_
#define MACE_KERNELS_FIXPOINT_H_
#ifndef MACE_OPS_FIXPOINT_H_
#define MACE_OPS_FIXPOINT_H_
#if defined(MACE_ENABLE_NEON)
#include <arm_neon.h>
......@@ -23,7 +23,7 @@
#include "mace/core/types.h"
namespace mace {
namespace kernels {
namespace ops {
inline uint8_t FindMax(const uint8_t *xs, const index_t size) {
uint8_t max_value = 0;
......@@ -57,8 +57,8 @@ inline uint8_t FindMax(const uint8_t *xs, const index_t size) {
}
} // namespace kernels
} // namespace ops
} // namespace mace
#endif // MACE_KERNELS_FIXPOINT_H_
#endif // MACE_OPS_FIXPOINT_H_
......@@ -17,10 +17,10 @@
#include <vector>
#include <algorithm>
#include "mace/kernels/fixpoint.h"
#include "mace/ops/fixpoint.h"
namespace mace {
namespace kernels {
namespace ops {
namespace test {
namespace {
......@@ -49,6 +49,6 @@ TEST(FixpointTest, FindMax) {
}
} // namespace test
} // namespace kernels
} // namespace ops
} // namespace mace
此差异已折叠。
......@@ -19,22 +19,22 @@
#include "mace/core/future.h"
#include "mace/core/operator.h"
#include "mace/core/tensor.h"
#include "mace/kernels/activation.h"
#include "mace/kernels/gemm.h"
#include "mace/kernels/gemmlowp_util.h"
#include "mace/ops/activation.h"
#include "mace/ops/gemm.h"
#include "mace/ops/gemmlowp_util.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/kernels/opencl/image/fully_connected.h"
#include "mace/ops/opencl/image/fully_connected.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace kernels {
namespace ops {
class FullyConnectedOpBase : public Operation {
public:
explicit FullyConnectedOpBase(OpConstructContext *context)
: Operation(context),
activation_(kernels::StringToActivationType(
activation_(ops::StringToActivationType(
Operation::GetOptionalArg<std::string>("activation",
"NOOP"))),
relux_max_limit_(Operation::GetOptionalArg<float>("max_limit", 0.0f)) {}
......@@ -229,5 +229,5 @@ void RegisterFullyConnected(OpRegistryBase *op_registry) {
#endif // MACE_ENABLE_OPENCL
}
} // namespace kernels
} // namespace ops
} // namespace mace
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册