提交 692d6aeb 编写于 作者: B BUG1989

initial to support TIM-VX NPU backend

上级 89e0b3d6
......@@ -52,4 +52,7 @@ build*/
.vscode
# 3rd party files depend on ACL
3rdparty/
\ No newline at end of file
3rdparty/
# 3rd party files depend on TIM-VX
src/dev/tim-vx/src/
......@@ -83,6 +83,7 @@ option(TENGINE_ENABLE_ACL "Build with Arm Compute Library(ACL) support" OFF)
option(TENGINE_ENABLE_VULKAN "Build with Vulkan GPU compute support" OFF)
option(TENGINE_ENABLE_TENSORRT "Build with nVIDIA TensorRT support" OFF)
option(TENGINE_ENABLE_CUDABACKEND "Build with nVIDIA cuda support" OFF)
option(TENGINE_ENABLE_TIM_VX "Build with VSI Tensor Interface Module for OpenVX support" OFF)
# add_definitions(-DCONFIG_DISABLE_PARAM_ACCESS)
# add_definitions(-DCONFIG_INTERN_ALLOCATOR)
......
......@@ -77,6 +77,7 @@ Tengine Lite 参考和借鉴了下列项目:
- [ACL](https://github.com/ARM-software/ComputeLibrary)
- [stb](https://github.com/nothings/stb)
- [convertmodel](https://convertmodel.com)
- [TIM-VX](https://github.com/VeriSilicon/TIM-VX)
## License
......
......@@ -45,7 +45,7 @@ The core code of Tengine Lite consists of 4 modules:
### Model Convert tool
- [Pre-compiled version](https://github.com/OAID/Tengine-Convert-Tools/releases/download/v0.1/tm_convert_tool): Pre-compiled model convert tool is provided on Linux system;
- [Pre-compiled version](https://github.com/OAID/Tengine/releases/download/lite-v1.2/convert_tool.zip): Pre-compiled model convert tool is provided on Linux system;
- [Online Convert tool](https://convertmodel.com/#outputFormat=tengine): Based on WebAssembly (the models are converted locally by browsers, no private data will be uploaded);
- [Source Compilation](https://github.com/OAID/Tengine-Convert-Tools): Refer to **Tengine-Convert-Tools** project, convert tool could be built by users.
......@@ -66,11 +66,13 @@ Tengine Lite got ideas and developed based on these projects:
- [MegEngine](https://github.com/MegEngine/MegEngine)
- [ONNX](https://github.com/onnx/onnx)
- [ncnn](https://github.com/Tencent/ncnn)
- [FeatherCNN](https://github.com/Tencent/FeatherCNN)
- [MNN](https://github.com/alibaba/MNN)
- [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite)
- [ACL](https://github.com/ARM-software/ComputeLibrary)
- [stb](https://github.com/nothings/stb)
- [convertmodel](https://convertmodel.com)
- [TIM-VX](https://github.com/VeriSilicon/TIM-VX)
## License
......
doc/architecture.png

97.9 KB | W: | H:

doc/architecture.png

103.1 KB | W: | H:

doc/architecture.png
doc/architecture.png
doc/architecture.png
doc/architecture.png
  • 2-up
  • Swipe
  • Onion skin
# Tengine Lite VeriSilicon TIM-VX User Manual
## Brief
TIM-VX is a software integration module provided by VeriSilicon to facilitate deployment of Neural-Networks on OpenVX enabled ML accelerators.
Tengine Lite has supported to integrate with TIM-VX Library of Verisilicon to inference CNN by Khadas VIM3(Amlogic A311D).
## Build
For some special reasons, only supported on Khadas VIM3 to work the following steps, currently.
### TIM-VX NPU Library
#### Download Source code of TIM-VX
```bash
$ git clone https://github.com/VeriSilicon/TIM-VX.git
```
#### Download prebuild-sdk of A311D
```bash
$ wget -c https://github.com/VeriSilicon/TIM-VX/releases/download/v1.1.28/aarch64_A311D_D312513_A294074_R311680_T312233_O312045.tgz
$ tar zxvf aarch64_A311D_D312513_A294074_R311680_T312233_O312045.tgz
$ mv aarch64_A311D_D312513_A294074_R311680_T312233_O312045 prebuild-sdk-a311d
```
### Tengine Lite
#### Download Tengine Lite
```bash
$ git clone https://github.com/OAID/Tengine.git tengine-lite
$ cd tengine-lite
```
#### Create depend files
```bash
$ cd <tengine-lite-root-dir>
$ mkdir -p ./3rdparty/tim-vx/lib/aarch64
$ mkdir -p ./3rdparty/tim-vx/include
$ cp -rf ../TIM-VX/include/* ./3rdparty/tim-vx/include/
$ cp -rf ../TIM-VX/src ./src/dev/tim-vx/
$ cp -rf ../prebuild-sdk-a311d/include/* ./3rdparty/tim-vx/include/
$ cp -rf ../prebuild-sdk-a311d/lib/*.so ./3rdparty/tim-vx/lib/aarch64/
```
#### Build Tengine Lite
```bash
$ mkdir build && cd build
$ cmake -DTENGINE_ENABLE_TIM_VX=ON -DTENGINE_ENABLE_TIM_VX_INTEGRATION=ON ..
$ make -j4
$ make install
```
## Demo
#### Depned librarys
```
3rdparty/tim-vx/lib/
├── libOpenVX.so.1
├── libVSC.so
├── libGAL.so
├── libArchModelSw.so
└── libNNArchPerf.so
build-tim-vx-arm64/install/lib/
└── libtengine-lite.so
```
On the Khadas VIM3, it need to replace those libraries in the /lib/ path
#### Set uint8 Inference mode
TIM-VX Library needs the uint8 network model
```bash
/* set runtime options */
struct options opt;
opt.num_thread = num_thread;
opt.cluster = TENGINE_CLUSTER_ALL;
opt.precision = TENGINE_MODE_UINT8;
opt.affinity = 0;
```
#### Result
```
[khadas@Khadas tengine-lite]# ./tm_classification_timvx -m squeezenet_uint8.tmfile -i cat.jpg -r 1 -s 0.017,0.017,0.017 -r 10
Tengine plugin allocator TIMVX is registered.
Image height not specified, use default 227
Image width not specified, use default 227
Mean value not specified, use default 104.0, 116.7, 122.7
tengine-lite library version: 1.2-dev
TIM-VX prerun.
model file : squeezenet_uint8.tmfile
image file : cat.jpg
img_h, img_w, scale[3], mean[3] : 227 227 , 0.017 0.017 0.017, 104.0 116.7 122.7
Repeat 10 times, thread 1, avg time 2.95 ms, max_time 3.42 ms, min_time 2.76 ms
--------------------------------------
34.786182, 278
33.942883, 287
33.732056, 280
32.045452, 277
30.780502, 282
```
......@@ -7,7 +7,8 @@
- [ ] fix the Float32 bugs of Vulkan
- [ ] support the mode type of PaddlePaddle
- [x] support the mode type of OneFlow
- [ ] opensource the plugin implement of NPU (A311D)
- [x] opensource the plugin implement of NPU (A311D)
- [x] opensource the plugin implement of CUDA
- [x] opensource the plugin implement of TensorRT
- [ ] opensource the plugin implement of NNIE
- [x] add more test case
......@@ -25,6 +25,7 @@ tengine_example(tm_classification_int8 tm_classification_int8.c)
tengine_example(tm_classification_uint8 tm_classification_uint8.c)
tengine_example(tm_classification_vulkan tm_classification_vulkan.c)
tengine_example(tm_classification_acl tm_classification_acl.c)
tengine_example(tm_classification_timvx tm_classification_timvx.c)
tengine_example(tm_classification_trt tm_classification_trt.cpp)
tengine_example(tm_classification_cuda tm_classification_cuda.cpp)
tengine_example(tm_mobilenet_ssd tm_mobilenet_ssd.c)
......
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2020, OPEN AI LAB
* Author: qtang@openailab.com
*/
#include <stdlib.h>
#include <stdio.h>
#include "common.h"
#include "tengine_c_api.h"
#include "tengine_operations.h"
#define DEFAULT_IMG_H 227
#define DEFAULT_IMG_W 227
#define DEFAULT_SCALE1 1.f
#define DEFAULT_SCALE2 1.f
#define DEFAULT_SCALE3 1.f
#define DEFAULT_MEAN1 104.007
#define DEFAULT_MEAN2 116.669
#define DEFAULT_MEAN3 122.679
#define DEFAULT_LOOP_COUNT 1
#define DEFAULT_THREAD_COUNT 1
#define DEFAULT_CPU_AFFINITY 255
void get_input_uint8_data(const char* image_file, uint8_t* input_data, int img_h, int img_w, float* mean, float* scale,
float input_scale, int zero_point)
{
image img = imread_process(image_file, img_w, img_h, mean, scale);
float* image_data = ( float* )img.data;
for (int i = 0; i < img_w * img_h * 3; i++)
{
int udata = (round)(image_data[i] / input_scale + (float)zero_point);
if (udata > 255)
udata = 255;
else if (udata < 0)
udata = 0;
input_data[i] = udata;
}
free_image(img);
}
int tengine_classify(const char* model_file, const char* image_file, int img_h, int img_w, float* mean, float* scale,
int loop_count, int num_thread, int affinity)
{
/* set runtime options */
struct options opt;
opt.num_thread = num_thread;
opt.cluster = TENGINE_CLUSTER_ALL;
opt.precision = TENGINE_MODE_UINT8;
opt.affinity = affinity;
/* inital tengine */
if (init_tengine() != 0)
{
fprintf(stderr, "Initial tengine failed.\n");
return -1;
}
fprintf(stderr, "tengine-lite library version: %s\n", get_tengine_version());
/* create arm acl backend */
context_t timvx_context = create_context("timvx", 1);
int rtt = add_context_device(timvx_context, "TIMVX");
if (0 > rtt)
{
fprintf(stderr, " add_context_device VSI DEVICE failed.\n");
return -1;
}
/* create graph, load tengine model xxx.tmfile */
graph_t graph = create_graph(timvx_context, "tengine", model_file);
if (NULL == graph)
{
fprintf(stderr, "Create graph failed.\n");
fprintf(stderr, "errno: %d \n", get_tengine_errno());
return -1;
}
/* set the input shape to initial the graph, and prerun graph to infer shape */
int img_size = img_h * img_w * 3;
int dims[] = {1, 3, img_h, img_w}; // nchw
uint8_t* input_data = ( uint8_t* )malloc(img_size);
tensor_t input_tensor = get_graph_input_tensor(graph, 0, 0);
if (input_tensor == NULL)
{
fprintf(stderr, "Get input tensor failed\n");
return -1;
}
if (set_tensor_shape(input_tensor, dims, 4) < 0)
{
fprintf(stderr, "Set input tensor shape failed\n");
return -1;
}
if (set_tensor_buffer(input_tensor, input_data, img_size) < 0)
{
fprintf(stderr, "Set input tensor buffer failed\n");
return -1;
}
/* prerun graph, set work options(num_thread, cluster, precision) */
if (prerun_graph_multithread(graph, opt) < 0)
{
fprintf(stderr, "Prerun multithread graph failed.\n");
return -1;
}
/* prepare process input data, set the data mem to input tensor */
float input_scale = 0.f;
int input_zero_point = 0;
get_tensor_quant_param(input_tensor, &input_scale, &input_zero_point, 1);
get_input_uint8_data(image_file, input_data, img_h, img_w, mean, scale, input_scale, input_zero_point);
/* run graph */
double min_time = DBL_MAX;
double max_time = DBL_MIN;
double total_time = 0.;
for (int i = 0; i < loop_count; i++)
{
double start = get_current_time();
if (run_graph(graph, 1) < 0)
{
fprintf(stderr, "Run graph failed\n");
return -1;
}
double end = get_current_time();
double cur = end - start;
total_time += cur;
if (min_time > cur)
min_time = cur;
if (max_time < cur)
max_time = cur;
}
fprintf(stderr, "\nmodel file : %s\n", model_file);
fprintf(stderr, "image file : %s\n", image_file);
fprintf(stderr, "img_h, img_w, scale[3], mean[3] : %d %d , %.3f %.3f %.3f, %.1f %.1f %.1f\n", img_h, img_w,
scale[0], scale[1], scale[2], mean[0], mean[1], mean[2]);
fprintf(stderr, "Repeat %d times, thread %d, avg time %.2f ms, max_time %.2f ms, min_time %.2f ms\n", loop_count,
num_thread, total_time / loop_count, max_time, min_time);
fprintf(stderr, "--------------------------------------\n");
/* get the result of classification */
tensor_t output_tensor = get_graph_output_tensor(graph, 0, 0);
uint8_t* output_u8 = ( uint8_t* )get_tensor_buffer(output_tensor);
int output_size = get_tensor_buffer_size(output_tensor);
/* dequant */
float output_scale = 0.f;
int output_zero_point = 0;
get_tensor_quant_param(output_tensor, &output_scale, &output_zero_point, 1);
float* output_data = ( float* )malloc(output_size * sizeof(float));
for (int i = 0; i < output_size; i++)
output_data[i] = (( float )output_u8[i] - ( float )output_zero_point) * output_scale;
print_topk(output_data, output_size, 5);
fprintf(stderr, "--------------------------------------\n");
/* release tengine */
free(input_data);
free(output_data);
postrun_graph(graph);
destroy_graph(graph);
release_tengine();
return 0;
}
void show_usage()
{
fprintf(
stderr,
"[Usage]: [-h]\n [-m model_file] [-i image_file]\n [-g img_h,img_w] [-s scale[0],scale[1],scale[2]] [-w "
"mean[0],mean[1],mean[2]] [-r loop_count] [-t thread_count] [-a cpu_affinity]\n");
fprintf(
stderr,
"\nmobilenet example: \n ./classification -m /path/to/mobilenet.tmfile -i /path/to/img.jpg -g 224,224 -s "
"0.017,0.017,0.017 -w 104.007,116.669,122.679\n");
}
int main(int argc, char* argv[])
{
int loop_count = DEFAULT_LOOP_COUNT;
int num_thread = DEFAULT_THREAD_COUNT;
int cpu_affinity = DEFAULT_CPU_AFFINITY;
char* model_file = NULL;
char* image_file = NULL;
float img_hw[2] = {0.f};
int img_h = 0;
int img_w = 0;
float mean[3] = {-1.f, -1.f, -1.f};
float scale[3] = {0.f, 0.f, 0.f};
int res;
while ((res = getopt(argc, argv, "m:i:l:g:s:w:r:t:a:h")) != -1)
{
switch (res)
{
case 'm':
model_file = optarg;
break;
case 'i':
image_file = optarg;
break;
case 'g':
split(img_hw, optarg, ",");
img_h = ( int )img_hw[0];
img_w = ( int )img_hw[1];
break;
case 's':
split(scale, optarg, ",");
break;
case 'w':
split(mean, optarg, ",");
break;
case 'r':
loop_count = atoi(optarg);
break;
case 't':
num_thread = atoi(optarg);
break;
case 'a':
cpu_affinity = atoi(optarg);
break;
case 'h':
show_usage();
return 0;
default:
break;
}
}
/* check files */
if (model_file == NULL)
{
fprintf(stderr, "Error: Tengine model file not specified!\n");
show_usage();
return -1;
}
if (image_file == NULL)
{
fprintf(stderr, "Error: Image file not specified!\n");
show_usage();
return -1;
}
if (!check_file_exist(model_file) || !check_file_exist(image_file))
return -1;
if (img_h == 0)
{
img_h = DEFAULT_IMG_H;
fprintf(stderr, "Image height not specified, use default %d\n", img_h);
}
if (img_w == 0)
{
img_w = DEFAULT_IMG_W;
fprintf(stderr, "Image width not specified, use default %d\n", img_w);
}
if (scale[0] == 0.f || scale[1] == 0.f || scale[2] == 0.f)
{
scale[0] = DEFAULT_SCALE1;
scale[1] = DEFAULT_SCALE2;
scale[2] = DEFAULT_SCALE3;
fprintf(stderr, "Scale value not specified, use default %.1f, %.1f, %.1f\n", scale[0], scale[1], scale[2]);
}
if (mean[0] == -1.0 || mean[1] == -1.0 || mean[2] == -1.0)
{
mean[0] = DEFAULT_MEAN1;
mean[1] = DEFAULT_MEAN2;
mean[2] = DEFAULT_MEAN3;
fprintf(stderr, "Mean value not specified, use default %.1f, %.1f, %.1f\n", mean[0], mean[1], mean[2]);
}
if (tengine_classify(model_file, image_file, img_h, img_w, mean, scale, loop_count, num_thread, cpu_affinity) < 0)
return -1;
return 0;
}
......@@ -172,8 +172,6 @@ int tengine_classify(const char* model_file, const char* image_file, int img_h,
/* release tengine */
free(input_data);
free(output_data);
release_graph_tensor(input_tensor);
release_graph_tensor(output_tensor);
postrun_graph(graph);
destroy_graph(graph);
release_tengine();
......
......@@ -200,6 +200,87 @@ if (TENGINE_ENABLE_TENSORRT)
file(GLOB_RECURSE TENGINE_BACKEND_TENSORRT_OPS "${CMAKE_CURRENT_SOURCE_DIR}/dev/tensorrt/op/*.cpp")
endif ()
if (TENGINE_ENABLE_TIM_VX)
if (${TENGINE_TARGET_PROCESSOR} MATCHES "ARM")
set(TIM_VX_ARCH "aarch64")
elseif (${TENGINE_TARGET_PROCESSOR} MATCHES "X86")
set(TIM_VX_ARCH "x86_64")
else()
message(FATAL_ERROR "Tengine: Unsupported OS:${TENGINE_TARGET_PROCESSOR}")
endif()
if (TENGINE_ENABLE_TIM_VX_INTEGRATION)
set(VSI_TIM_NAME "tim_vx_internal")
set(VSI_TIM_VX_BASE "${CMAKE_CURRENT_SOURCE_DIR}/dev/tim-vx/src/tim/vx")
aux_source_directory(${VSI_TIM_VX_BASE} VSI_TIM_VX_SRC)
aux_source_directory(${VSI_TIM_VX_BASE}/ops VSI_TIM_OPS_SRC)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src VSI_TIM_INTERNAL_SRC)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/kernel VSI_TIM_INTERNAL_KERNEL)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/kernel/cl VSI_TIM_INTERNAL_KERNEL_CL)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/kernel/cpu VSI_TIM_INTERNAL_KERNEL_CPU)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/kernel/evis VSI_TIM_INTERNAL_KERNEL_EVIS)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/kernel/vx VSI_TIM_INTERNAL_KERNEL_VX)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/ops VSI_TIM_INTERNAL_OPS)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/client VSI_TIM_INTERNAL_CLIENT)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/libnnext VSI_TIM_INTERNAL_LIBNNEXT)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/libnnext/ops/kernel VSI_TIM_INTERNAL_LIBNNEXT_OPS_KERNEL)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/quantization VSI_TIM_INTERNAL_QUANTIZATION)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/custom/ops VSI_TIM_INTERNAL_CUSTOM_OPS)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/custom/ops/kernel VSI_TIM_INTERNAL_CUSTOM_OPS_KERNEL)
aux_source_directory(${VSI_TIM_VX_BASE}/internal/src/utils VSI_TIM_INTERNAL_UTILS)
list(APPEND VSI_TIM_VX_ALL_SRC
${VSI_TIM_VX_SRC}
${VSI_TIM_OPS_SRC}
${VSI_TIM_INTERNAL_SRC}
${VSI_TIM_INTERNAL_KERNEL}
${VSI_TIM_INTERNAL_KERNEL_CL}
${VSI_TIM_INTERNAL_KERNEL_CPU}
${VSI_TIM_INTERNAL_KERNEL_EVIS}
${VSI_TIM_INTERNAL_KERNEL_VX}
${VSI_TIM_INTERNAL_OPS}
${VSI_TIM_INTERNAL_CLIENT}
${VSI_TIM_INTERNAL_LIBNNEXT}
${VSI_TIM_INTERNAL_LIBNNEXT_OPS_KERNEL}
${VSI_TIM_INTERNAL_QUANTIZATION}
${VSI_TIM_INTERNAL_CUSTOM_OPS}
${VSI_TIM_INTERNAL_CUSTOM_OPS_KERNEL}
${VSI_TIM_INTERNAL_UTILS}
)
#message("VSI_TIM_VX_ALL_SRC=${VSI_TIM_VX_ALL_SRC}")
add_library(${VSI_TIM_NAME} STATIC ${VSI_TIM_VX_ALL_SRC})
target_link_directories(${VSI_TIM_NAME} PUBLIC ${CMAKE_SOURCE_DIR}/3rdparty/tim-vx/lib/${TIM_VX_ARCH})
target_link_libraries(${VSI_TIM_NAME} PRIVATE CLC GAL OpenVX OpenVXU VSC ArchModelSw NNArchPerf)
target_include_directories(${VSI_TIM_NAME} PRIVATE ${CMAKE_SOURCE_DIR}/3rdparty/tim-vx/include)
target_include_directories(${VSI_TIM_NAME} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/dev/tim-vx/include)
target_include_directories(${VSI_TIM_NAME} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/dev/tim-vx/include/tim/vx)
target_include_directories(${VSI_TIM_NAME} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/dev/tim-vx/src/tim/vx)
target_include_directories(${VSI_TIM_NAME} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/dev/tim-vx/src/tim/vx/internal/include)
set_target_properties(${VSI_TIM_NAME} PROPERTIES CXX_STANDARD_REQUIRED 14)
set_target_properties(${VSI_TIM_NAME} PROPERTIES CXX_STANDARD 14)
set(VSI_TIM_OVXLIB_API_ATTR "__attribute__\(\(visibility\(\"default\"\)\)\)")
target_compile_definitions(${VSI_TIM_NAME} PRIVATE "-DOVXLIB_API=${VSI_TIM_OVXLIB_API_ATTR}")
target_compile_options(${VSI_TIM_NAME} PRIVATE $<$<OR:$<COMPILE_LANGUAGE:C>,$<COMPILE_LANGUAGE:CXX>>:-fPIC>)
target_compile_options(${VSI_TIM_NAME} PRIVATE $<$<OR:$<COMPILE_LANGUAGE:C>,$<COMPILE_LANGUAGE:CXX>>:-O0>)
target_compile_options(${VSI_TIM_NAME} PRIVATE $<$<OR:$<COMPILE_LANGUAGE:C>,$<COMPILE_LANGUAGE:CXX>>:-g>)
endif()
list(APPEND TENGINE_INCLUDE_DIRS_PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/dev/tim-vx)
list(APPEND TENGINE_INCLUDE_DIRS_PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/dev/tim-vx/op)
list(APPEND TENGINE_INCLUDE_DIRS_PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/dev/tim-vx/include)
list(APPEND TENGINE_INCLUDE_DIRS_PRIVATE ${CMAKE_SOURCE_DIR}/3rdparty/tim-vx/include)
list(APPEND TENGINE_TIM_VX_LIB_DIRS ${CMAKE_SOURCE_DIR}/3rdparty/tim-vx/lib/${TIM_VX_ARCH})
file(GLOB TENGINE_BACKEND_TIM_VX_BASE "${CMAKE_CURRENT_SOURCE_DIR}/dev/tim-vx/*.cc")
file(GLOB TENGINE_BACKEND_TIM_VX_OPS "${CMAKE_CURRENT_SOURCE_DIR}/dev/tim-vx/op/*.cc")
endif ()
# add nVIDIA cudabackend support
if (TENGINE_ENABLE_CUDABACKEND)
enable_language(CUDA)
......@@ -299,7 +380,9 @@ if (${TENGINE_TARGET_PROCESSOR} MATCHES "ARM")
${TENGINE_BACKEND_TENSORRT_BASE}
${TENGINE_BACKEND_TENSORRT_OPS}
${TENGINE_BACKEND_CUDABACKEND_BASE}
${TENGINE_BACKEND_CUDABACKEND_OPS})
${TENGINE_BACKEND_CUDABACKEND_OPS}
${TENGINE_BACKEND_TIM_VX_BASE}
${TENGINE_BACKEND_TIM_VX_OPS})
elseif (${TENGINE_TARGET_PROCESSOR} MATCHES "X86")
add_library(${CMAKE_PROJECT_NAME} SHARED
${TENGINE_LIB_SRCS} ${TENGINE_FRONT_END_SRCS}
......@@ -313,7 +396,9 @@ elseif (${TENGINE_TARGET_PROCESSOR} MATCHES "X86")
${TENGINE_BACKEND_TENSORRT_BASE}
${TENGINE_BACKEND_TENSORRT_OPS}
${TENGINE_BACKEND_CUDABACKEND_BASE}
${TENGINE_BACKEND_CUDABACKEND_OPS})
${TENGINE_BACKEND_CUDABACKEND_OPS}
${TENGINE_BACKEND_TIM_VX_BASE}
${TENGINE_BACKEND_TIM_VX_OPS})
elseif (${TENGINE_TARGET_PROCESSOR} MATCHES "MIPS")
add_definitions(-mips64r2)
add_definitions(-mabi=64)
......@@ -336,8 +421,10 @@ else()
endif()
if (NOT TENGINE_FORCE_SKIP_OPENMP)
TENGINE_USE_LIB_OPENMP(${CMAKE_PROJECT_NAME})
endif()
TENGINE_USE_LIB_OPENMP(${CMAKE_PROJECT_NAME})
# show linking libraries
if(TENGINE_VERBOSE)
message (STATUS "TENGINE: 'TENGINE_LINKING_LIBRARIES_PRIVATE' is ${TENGINE_LINKING_LIBRARIES_PRIVATE}.")
......@@ -386,6 +473,15 @@ if (TENGINE_ENABLE_TENSORRT)
list(APPEND TENGINE_LINKING_LIBRARIES_PRIVATE cudart)
endif()
if (TENGINE_ENABLE_TIM_VX)
target_link_directories(${CMAKE_PROJECT_NAME} PUBLIC ${TENGINE_TIM_VX_LIB_DIRS})
if (TENGINE_ENABLE_TIM_VX_INTEGRATION)
list(APPEND TENGINE_LINKING_LIBRARIES_PRIVATE ${VSI_TIM_NAME})
else()
list(APPEND TENGINE_LINKING_LIBRARIES_PRIVATE tim-vx)
endif()
endif()
if (TENGINE_ENABLE_CUDABACKEND)
target_compile_options(${CMAKE_PROJECT_NAME} PRIVATE $<$<COMPILE_LANGUAGE:CUDA>: ${TENGINE_COMPILE_DEFINITION_CUDA_PRIVATE}>)
target_compile_options(${CMAKE_PROJECT_NAME} PRIVATE $<$<COMPILE_LANGUAGE:CUDA>: ${TENGINE_COMPILE_OPTIONS_CUDA_PRIVATE}>)
......
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: hhchen@openailab.com
*/
#include "timvx_executor.hpp"
extern "C"
{
#include "tengine_op.h"
#include "eltwise_param.h"
}
bool VXEngine::AddEltwisSumNode(struct ir_node* ir_node)
{
TLOG_INFO("Tengine TIM-VX: Support OP(%d) OP_RELU.\n", ir_node->idx);
struct ir_graph* ir_graph = ir_node->graph;
std::vector<std::shared_ptr<tim::vx::Tensor> > add_in_tensor(ir_node->input_num);
for (int i = 0; i < ir_node->input_num; i++)
{
struct ir_tensor* input_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[i]);
add_in_tensor[i] = this->vx_tensor_map[input_tensor->idx];
fprintf(stderr,"\nadd_in_tensor.shape()\n");
for (int j = 0; j < 4; j++)
{
fprintf(stderr,"%d ",add_in_tensor[i]->GetShape()[j]);
}
}
struct ir_tensor* output_tensor = get_ir_graph_tensor(ir_graph, ir_node->output_tensors[0]);
fprintf(stderr,"\nadd_out_tensor.shape()\n");
for (int j = 0; j < 4; j++)
{
fprintf(stderr,"%d ",this->vx_tensor_map[output_tensor->idx]->GetShape()[j]);
}
eltwise_param* param = (eltwise_param*)ir_node->op.param_mem;
switch (param->type)
{
case ELT_SUM:
{
auto eltsum = graph->CreateOperation<tim::vx::ops::Add>();
(*eltsum)
.BindInputs(add_in_tensor)
.BindOutputs({ this->vx_tensor_map[output_tensor->idx] });
break;
}
default:
break;
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: hhchen@openailab.com
*/
#include "timvx_executor.hpp"
extern "C"
{
#include "tengine_op.h"
}
bool VXEngine::AddClipNode(struct ir_node* ir_node)
{
TLOG_INFO("Tengine TIM-VX: Support OP(%d) OP_RELU.\n", ir_node->idx);
struct ir_graph* ir_graph = ir_node->graph;
struct ir_tensor* input_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[0]);
struct ir_tensor* output_tensor = get_ir_graph_tensor(ir_graph, ir_node->output_tensors[0]);
auto relu = this->graph->CreateOperation<tim::vx::ops::Relu6>();
(*relu).BindInput( this->vx_tensor_map[input_tensor->idx] )
.BindOutput({ this->vx_tensor_map[output_tensor->idx] });
return true;
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: hhchen@openailab.com
*/
#include "timvx_executor.hpp"
extern "C"
{
#include "tengine_op.h"
#include "concat_param.h"
}
bool VXEngine::AddConcatNode(struct ir_node* ir_node)
{
TLOG_INFO("Tengine TIM-VX: Support OP(%d) OP_CONCAT.\n", ir_node->idx);
struct ir_graph* ir_graph = ir_node->graph;
std::vector<std::shared_ptr<tim::vx::Tensor> > concat_in_tensor(ir_node->input_num);
for (int i = 0; i < ir_node->input_num; i++)
{
struct ir_tensor* input_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[i]);
concat_in_tensor[i] = this->vx_tensor_map[input_tensor->idx];
}
struct concat_param* param = (struct concat_param*)ir_node->op.param_mem;
struct ir_tensor* output_tensor = get_ir_graph_tensor(ir_graph, ir_node->output_tensors[0]);
auto concat = graph->CreateOperation<tim::vx::ops::Concat>(output_tensor->dim_num - param->axis - 1, ir_node->input_num);
(*concat)
.BindInputs(concat_in_tensor)
.BindOutputs({ this->vx_tensor_map[output_tensor->idx] });
return true;
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: hhchen@openailab.com
*/
#include "timvx_executor.hpp"
extern "C"
{
#include "tengine_op.h"
#include "convolution_param.h"
}
bool VXEngine::AddConvolutionNode(struct ir_node* ir_node)
{
TLOG_INFO("Tengine TIM-VX: Support OP(%d) OP_CONV.\n", ir_node->idx);
struct ir_graph* ir_graph = ir_node->graph;
struct conv_param* param = (struct conv_param*)ir_node->op.param_mem;
struct ir_tensor* input_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[0]);
struct ir_tensor* weight_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[1]);
struct ir_tensor* output_tensor = get_ir_graph_tensor(ir_graph, ir_node->output_tensors[0]);
tim::vx::PadType padtype;
int h = input_tensor->dims[2];
int out_h = (h - 1) / param->stride_h + 1;
int total_len_h = (out_h - 1) * param->stride_h + param->kernel_h;
int pad_num_h = total_len_h - h;
int pad_h0 = 0;
if (param->pad_h0 == pad_num_h / 2 && param->pad_h1 == pad_num_h - pad_num_h / 2)
{
pad_h0 = -1;
}
int w = input_tensor->dims[3];
int out_w = (w - 1) / param->stride_w + 1;
int total_len_w = (out_w - 1) * param->stride_w + param->kernel_w;
int pad_num_w = total_len_w - w;
int pad_w0 = 0;
if (param->pad_w0 == pad_num_w / 2 && param->pad_w1 == pad_num_w - pad_num_w / 2)
{
pad_w0 = -1;
}
if (pad_h0 == -1 && pad_w0 == -1)
{
TLOG_INFO("Log:tim::vx::PadType::SAME\n");
padtype = tim::vx::PadType::SAME;
}
else if(param->pad_h0 == 0 && param->pad_w0 == 0)
{
TLOG_INFO("Log:tim::vx::PadType::VALID\n");
padtype = tim::vx::PadType::VALID;
}
int multiplier = 0;
if (param->group == weight_tensor->dims[0])
multiplier = 1;
auto conv = this->graph->CreateOperation<tim::vx::ops::Conv2d>(
weight_tensor->dims[0], padtype,
std::array<uint32_t, 2>({ (unsigned int)param->kernel_h, (unsigned int)param->kernel_w }),
std::array<uint32_t, 2>({ (unsigned int)param->stride_h, (unsigned int)param->stride_w }),
std::array<uint32_t, 2>({ (unsigned int)param->dilation_h, (unsigned int)param->dilation_w }),
multiplier);
if (param->activation >= 0)
{
tim::vx::Quantization tmp_quant(tim::vx::QuantType::ASYMMETRIC,
output_tensor->scale, output_tensor->zero_point);
tim::vx::ShapeType vx_shape;
std::vector<uint32_t> perm;
for (int i = output_tensor->dim_num - 1; i >= 0; i--)
{
vx_shape.push_back(output_tensor->dims[i]);
perm.push_back(output_tensor->dims[i]);
}
tim::vx::TensorSpec tmp_spec(tim::vx::DataType::UINT8, vx_shape,
tim::vx::TensorAttribute::TRANSIENT,
tmp_quant);
TLOG_INFO("Log:0append relu\n");
auto tmp_output = this->graph->CreateTensor(tmp_spec);
if (ir_node->input_num > 2)
{
TLOG_INFO("Log:Use Bias\n");
struct ir_tensor* bias_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[2]);
(*conv)
.BindInputs({ this->vx_tensor_map[input_tensor->idx], this->vx_tensor_map[weight_tensor->idx], this->vx_tensor_map[bias_tensor->idx] })
.BindOutputs({ tmp_output });
}
else
{
(*conv)
.BindInputs({ this->vx_tensor_map[input_tensor->idx], this->vx_tensor_map[weight_tensor->idx] })
.BindOutputs({ tmp_output });
}
// this->vx_tensor_map[output_tensor->idx] = tmp_output;
if (param->activation == 0)
{
TLOG_INFO("Log:1.1append relu\n");
auto relu = this->graph->CreateOperation<tim::vx::ops::Relu>();
(*relu).BindInput( tmp_output )
.BindOutput({ this->vx_tensor_map[output_tensor->idx] });
}
else if (param->activation == 6)
{
TLOG_INFO("Log:2append relu6\n");
auto relu = this->graph->CreateOperation<tim::vx::ops::Relu6>();
(*relu).BindInput({ tmp_output })
.BindOutput({ this->vx_tensor_map[output_tensor->idx] });
}
}
else
{
if (ir_node->input_num > 2)
{
TLOG_INFO("Log:Use Bias\n");
struct ir_tensor* bias_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[2]);
(*conv)
.BindInputs({ this->vx_tensor_map[input_tensor->idx], this->vx_tensor_map[weight_tensor->idx], this->vx_tensor_map[bias_tensor->idx] })
.BindOutputs({ this->vx_tensor_map[output_tensor->idx] });
}
else
{
(*conv)
.BindInputs({ this->vx_tensor_map[input_tensor->idx], this->vx_tensor_map[weight_tensor->idx] })
.BindOutputs({ this->vx_tensor_map[output_tensor->idx] });
}
}
return true;
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: hhchen@openailab.com
*/
#include "timvx_executor.hpp"
extern "C"
{
#include "tengine_op.h"
#include "concat_param.h"
}
bool VXEngine::AddDropoutNode(struct ir_node* ir_node)
{
TLOG_INFO("Tengine TIM-VX: Support OP(%d) OP_DROPOUT.\n", ir_node->idx);
struct ir_graph* ir_graph = ir_node->graph;
struct ir_tensor* input_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[0]);
struct ir_tensor* output_tensor = get_ir_graph_tensor(ir_graph, ir_node->output_tensors[0]);
std::vector<uint32_t> perm;
for (int i = output_tensor->dim_num - 1; i >= 0; i--)
{
perm.push_back(output_tensor->dims[i]);
}
auto flatten = graph->CreateOperation<tim::vx::ops::Reshape>(perm);
(*flatten)
.BindInputs({ this->vx_tensor_map[input_tensor->idx] })
.BindOutputs({ this->vx_tensor_map[output_tensor->idx] });
return true;
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: hhchen@openailab.com
*/
#include "timvx_executor.hpp"
extern "C"
{
#include "tengine_op.h"
#include "convolution_param.h"
}
bool VXEngine::AddFullyConnectionNode(struct ir_node* ir_node)
{
TLOG_INFO("Tengine TIM-VX: Support OP(%d) OP_FC.\n", ir_node->idx);
struct ir_graph* ir_graph = ir_node->graph;
struct ir_tensor* input_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[0]);
struct ir_tensor* weight_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[1]);
struct ir_tensor* output_tensor = get_ir_graph_tensor(ir_graph, ir_node->output_tensors[0]);
auto fc = graph->CreateOperation<tim::vx::ops::FullyConnected>(
2, weight_tensor->dims[0]);
if (ir_node->input_num > 2)
{
TLOG_INFO("Log:Use Bias\n");
struct ir_tensor* bias_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[2]);
(*fc)
.BindInputs({this->vx_tensor_map[input_tensor->idx], this->vx_tensor_map[weight_tensor->idx], this->vx_tensor_map[bias_tensor->idx]})
.BindOutputs({ this->vx_tensor_map[output_tensor->idx] });
}
else
{
(*fc)
.BindInputs({ this->vx_tensor_map[input_tensor->idx], this->vx_tensor_map[weight_tensor->idx] })
.BindOutputs({ this->vx_tensor_map[output_tensor->idx] });
}
return true;
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: hhchen@openailab.com
*/
#include "timvx_executor.hpp"
extern "C"
{
#include "tengine_op.h"
}
bool VXEngine::AddFlattenNode(struct ir_node* ir_node)
{
TLOG_INFO("Tengine TIM-VX: Support OP(%d) OP_FLATTEN.\n", ir_node->idx);
struct ir_graph* ir_graph = ir_node->graph;
struct ir_tensor* input_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[0]);
struct ir_tensor* output_tensor = get_ir_graph_tensor(ir_graph, ir_node->output_tensors[0]);
std::vector<uint32_t> perm;
for (int i = output_tensor->dim_num - 1; i >= 0; i--)
{
perm.push_back(output_tensor->dims[i]);
}
auto flatten = graph->CreateOperation<tim::vx::ops::Reshape>(perm);
(*flatten)
.BindInputs({ this->vx_tensor_map[input_tensor->idx] })
.BindOutputs({ this->vx_tensor_map[output_tensor->idx] });
return true;
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: hhchen@openailab.com
*/
#include "timvx_executor.hpp"
extern "C"
{
#include "tengine_op.h"
#include "pooling_param.h"
}
bool VXEngine::AddPoolingNode(struct ir_node* ir_node)
{
TLOG_INFO("Tengine TIM-VX: Support OP(%d) OP_POOL.\n", ir_node->idx);
struct ir_graph* ir_graph = ir_node->graph;
struct pool_param* param = (struct pool_param*)ir_node->op.param_mem;
struct ir_tensor* input_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[0]);
struct ir_tensor* output_tensor = get_ir_graph_tensor(ir_graph, ir_node->output_tensors[0]);
tim::vx::PoolType pooltype;
if (param->pool_method == 0)
{
pooltype = tim::vx::PoolType::MAX;
}
else
{
pooltype = tim::vx::PoolType::AVG;
}
tim::vx::PadType padtype;
int h = input_tensor->dims[2];
int out_h = (h - 1) / param->stride_h + 1;
int total_len_h = (out_h - 1) * param->stride_h + param->kernel_h;
int pad_num_h = total_len_h - h;
int pad_h0 = 0;
if (param->pad_h0 == pad_num_h / 2 && param->pad_h1 == pad_num_h - pad_num_h / 2)
{
pad_h0 = -1;
}
int w = input_tensor->dims[3];
int out_w = (w - 1) / param->stride_w + 1;
int total_len_w = (out_w - 1) * param->stride_w + param->kernel_w;
int pad_num_w = total_len_w - w;
int pad_w0 = 0;
if (param->pad_w0 == pad_num_w / 2 && param->pad_w1 == pad_num_w - pad_num_w / 2)
{
pad_w0 = -1;
}
if (pad_h0 == -1 && pad_w0 == -1)
{
TLOG_INFO("Log:tim::vx::PadType::SAME\n");
padtype = tim::vx::PadType::SAME;
}
else if(param->pad_h0 == 0 && param->pad_w0 == 0)
{
TLOG_INFO("Log:tim::vx::PadType::VALID\n");
padtype = tim::vx::PadType::VALID;
}
auto pool = graph->CreateOperation<tim::vx::ops::Pool2d>(
pooltype, padtype,
std::array<uint32_t, 2>({ (unsigned int)param->kernel_h, (unsigned int)param->kernel_w}),
std::array<uint32_t, 2>({(unsigned int)param->stride_h, (unsigned int)param->stride_w}));
(*pool).BindInputs({ this->vx_tensor_map[input_tensor->idx] })
.BindOutputs({ this->vx_tensor_map[output_tensor->idx] });
return true;
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: hhchen@openailab.com
*/
#include "timvx_executor.hpp"
extern "C"
{
#include "tengine_op.h"
}
bool VXEngine::AddReluNode(struct ir_node* ir_node)
{
TLOG_INFO("Tengine TIM-VX: Support OP(%d) OP_RELU.\n", ir_node->idx);
struct ir_graph* ir_graph = ir_node->graph;
struct ir_tensor* input_tensor = get_ir_graph_tensor(ir_graph, ir_node->input_tensors[0]);
struct ir_tensor* output_tensor = get_ir_graph_tensor(ir_graph, ir_node->output_tensors[0]);
auto relu = this->graph->CreateOperation<tim::vx::ops::Relu>();
(*relu).BindInput( this->vx_tensor_map[input_tensor->idx] )
.BindOutput({ this->vx_tensor_map[output_tensor->idx] });
return true;
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: lswang@openailab.com
*/
extern "C"
{
#include "vector.h"
#include "nn_device.h"
#include "tengine_ir.h"
#include "tengine_log.h"
#include "tengine_errno.h"
#include "dev_allocator.h"
#include "tengine_c_api.h"
}
#include "timvx_device.hpp"
#include "timvx_limit.hpp"
#include "timvx_graph.hpp"
extern "C"
{
int timvx_describe(struct dev_allocator* allocator, struct vector* allowed_ops, struct vector* blocked_ops, struct vector* precision);
int timvx_evaluation(struct dev_allocator* allocator, struct subgraph* sub_graph, struct vector* evolution_tensors, struct vector* evolution_nodes);
int timvx_allocate(struct dev_allocator* allocator, struct subgraph* sub_graph);
int timvx_release(struct dev_allocator* allocator, struct subgraph* sub_graph);
}
int timvx_describe(struct dev_allocator* allocator, struct vector* allowed_ops, struct vector* blocked_ops, struct vector* precision)
{
(void)allocator;
for (int op_type : timvx_supported_ops)
{
push_vector_data(allowed_ops, &op_type);
}
for (int i = 0, j = 0; i < OP_BUILTIN_LAST; i++)
{
int op_type = timvx_supported_ops[j];
if (op_type != i)
{
push_vector_data(blocked_ops, &i);
}
else
{
if (j < sizeof(timvx_supported_ops) / sizeof(timvx_supported_ops[0]))
j++;
}
}
int precision_var = TENGINE_DT_UINT8;
push_vector_data(precision, &precision_var);
precision_var = TENGINE_DT_FP16;
push_vector_data(precision, &precision_var);
precision_var = TENGINE_DT_FP32;
push_vector_data(precision, &precision_var);
return 0;
}
int timvx_evaluation(struct dev_allocator* allocator, struct subgraph* sub_graph, struct vector* evolution_tensors, struct vector* evolution_nodes)
{
// nothing to do with tensorrt
(void)allocator;
(void)sub_graph;
(void)evolution_tensors;
(void)evolution_nodes;
return 0;
}
int timvx_allocate(struct dev_allocator* allocator, struct subgraph* sub_graph)
{
if (nullptr == allocator)
{
set_tengine_errno(EBADSLT);
return -1;
}
if (!strcmp(TIMVX_DEV_NAME, allocator->name))
{
set_tengine_errno(EBADSLT);
return -1;
}
/* set the correct input wait count: INPUT tensor is always ready */
sub_graph->input_wait_count = 0;
for (int i = 0; i < sub_graph->input_num; i++)
{
struct ir_tensor* tensor = get_ir_graph_tensor(sub_graph->graph, sub_graph->input_tensor_list[i]);
if (tensor->tensor_type == TENSOR_TYPE_VAR)
sub_graph->input_wait_count++;
}
return 0;
}
int timvx_release(struct dev_allocator* allocator, struct subgraph* sub_graph)
{
(void)sub_graph;
if (nullptr == allocator || !strcmp(TIMVX_DEV_NAME, allocator->name))
{
return -1;
}
return 0;
}
extern "C"
{
static struct timvx_device timvx_dev = {
.base = {
.name = TIMVX_DEV_NAME,
.init = timvx_dev_init,
.prerun = timvx_dev_prerun,
.run = timvx_dev_run,
.postrun = timvx_dev_postrun,
.async_run = nullptr,
.async_wait = nullptr,
.release = timvx_dev_release,
.release_exec_graph = nullptr,},
.load_graph = nullptr,
.load_ir_graph = nullptr,
.unload_graph = nullptr,
};
static struct dev_allocator timvx_allocator = {
.name = TIMVX_DEV_NAME,
.describe = timvx_describe,
.evaluation = timvx_evaluation,
.allocate = timvx_allocate,
.release = timvx_release,
};
int register_timvx_device(void)
{
TLOG_INFO("Tengine plugin device %s is registered.\n", timvx_dev.base.name);
return register_nn_device(&timvx_dev.base);
}
#ifdef STANDLONE_MODE
void register_timvx_allocator(void)
#else
static void register_timvx_allocator(void)
#endif
{
TLOG_INFO("Tengine plugin allocator %s is registered.\n", timvx_allocator.name);
init_allocator_registry(&timvx_allocator);
}
#ifndef STANDLONE_MODE
REGISTER_NN_DEVICE(&timvx_dev.base);
REGISTER_DEV_ALLOCATOR(register_timvx_allocator);
#endif
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: lswang@openailab.com
*/
#ifndef __TIMVX_DEVICE_H__
#define __TIMVX_DEVICE_H__
#define TIMVX_DEV_NAME "TIMVX"
extern "C"
{
#include "tengine_c_api.h"
struct timvx_device
{
struct nn_device base;
int (*load_graph)(struct timvx_device* dev);
int (*load_ir_graph)(struct timvx_device* dev);
int (*unload_graph)(struct timvx_device* dev);
};
DLLEXPORT int register_timvx_device(void);
}
#endif
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: lswang@openailab.com
*/
#include "timvx_executor.hpp"
#include "timvx_helper.hpp"
extern "C"
{
#include "tengine_op.h"
#include "tengine_log.h"
}
#define DEFAULT_DEVICE_ID 0
#define DEFAULT_MAX_BATCH 128
VXEngine::VXEngine()
{
this->context = tim::vx::Context::Create();
this->graph = context->CreateGraph();
};
void VXEngine::VXTensorMap(struct ir_graph* ir_graph, int ir_tensor_idx, int spec_type)
{
auto iter = this->vx_tensor_map.find(ir_tensor_idx);
TLOG_INFO("Log:ir_tensor_idx %d\n",ir_tensor_idx);
TLOG_INFO("Log:#### 001 %d\n",spec_type);
if (this->vx_tensor_map.end() == iter)
{
struct ir_tensor* ir_tensor = get_ir_graph_tensor(ir_graph, ir_tensor_idx);
unsigned int* Dims = (unsigned int*)ir_tensor->dims;
tim::vx::DataType datatype;
switch(ir_tensor->data_type)
{
case (1):
TLOG_INFO("Log:tim::vx::DataType::FLOAT16\n");
datatype = tim::vx::DataType::FLOAT16;
break;
case (3):
TLOG_INFO("Log:tim::vx::DataType::UINT8\n");
datatype = tim::vx::DataType::UINT8;
break;
case (4):
TLOG_INFO("Log:tim::vx::DataType::INT32\n");
datatype = tim::vx::DataType::INT32;
break;
default:
fprintf(stderr,"Don't support this date type(%d)\n",ir_tensor->data_type);
break;
}
tim::vx::ShapeType vx_shape;
TLOG_INFO("Log:ir_tensor->dim_num %d\n",ir_tensor->dim_num);
struct ir_node* ir_node = get_ir_graph_node(ir_graph, ir_tensor->producer);
if (ir_node->op.op_type == OP_FC && ir_node->output_tensors[0] == ir_tensor_idx)
{
for (int i = 1; i >= 0; i--)
{
vx_shape.push_back(Dims[i]);
}
}
else
{
for (int i = ir_tensor->dim_num - 1; i >= 0; i--)
{
vx_shape.push_back(Dims[i]);
}
}
tim::vx::Quantization vx_quant(tim::vx::QuantType::ASYMMETRIC, ir_tensor->scale,
ir_tensor->zero_point);
std::shared_ptr<tim::vx::Tensor> vx_tensor;
TLOG_INFO("Log:#### 010 %d\n",spec_type);
if (spec_type == SPEC_TYPE_OUTPUT)
{
tim::vx::TensorSpec vx_spec(datatype, vx_shape,
tim::vx::TensorAttribute::OUTPUT, vx_quant);
vx_tensor = this->graph->CreateTensor(vx_spec);
}
else if (spec_type == SPEC_TYPE_DWCONV)
{
TLOG_INFO("Log:#### 111 SPEC_TYPE_DWCONV\n");
vx_shape[ir_tensor->dim_num - 2] = vx_shape[ir_tensor->dim_num - 1];
vx_shape[ir_tensor->dim_num - 1] = 1;
tim::vx::TensorSpec vx_spec(datatype, vx_shape,
tim::vx::TensorAttribute::CONSTANT, vx_quant);
vx_tensor = this->graph->CreateTensor(vx_spec, ir_tensor->data);
}
else if (ir_tensor->tensor_type == TENSOR_TYPE_INPUT )
{
tim::vx::TensorSpec vx_spec(datatype, vx_shape,
tim::vx::TensorAttribute::INPUT, vx_quant);
vx_tensor = this->graph->CreateTensor(vx_spec);
}
else if (ir_tensor->tensor_type == TENSOR_TYPE_VAR)
{
tim::vx::TensorSpec vx_spec(datatype, vx_shape,
tim::vx::TensorAttribute::TRANSIENT, vx_quant);
vx_tensor = this->graph->CreateTensor(vx_spec);
}
else if (ir_tensor->tensor_type == TENSOR_TYPE_CONST)
{
tim::vx::TensorSpec vx_spec(datatype, vx_shape,
tim::vx::TensorAttribute::CONSTANT, vx_quant);
vx_tensor = this->graph->CreateTensor(vx_spec, ir_tensor->data);
}
this->vx_tensor_map[ir_tensor_idx] = vx_tensor;
}
TLOG_INFO("\n");
}
int VXEngine::Build(struct subgraph* subgraph)
{
struct ir_graph* ir_graph = subgraph->graph;
for (int i = 0; i < subgraph->node_num; i++)
{
uint16_t node_id = subgraph->node_list[i];
struct ir_node* ir_node = get_ir_graph_node(ir_graph, node_id);
auto op_type = ir_node->op.op_type;
switch (op_type)
{
case OP_CLIP:
this->AddClipNode(ir_node);
break;
case OP_CONCAT:
this->AddConcatNode(ir_node);
break;
case OP_CONST:
case OP_INPUT:
continue;
case OP_CONV:
this->AddConvolutionNode(ir_node);
break;
case OP_DROPOUT:
this->AddDropoutNode(ir_node);
break;
case OP_ELTWISE:
this->AddEltwisSumNode(ir_node);
break;
case OP_FC:
this->AddFullyConnectionNode(ir_node);
break;
case OP_FLATTEN:
this->AddFlattenNode(ir_node);
break;
// case OP_PERMUTE:
// this->AddPermuteNode(ir_graph, ir_node);
// break;
case OP_POOL:
this->AddPoolingNode(ir_node);
break;
case OP_RELU:
this->AddReluNode(ir_node);
break;
// case OP_RESHAPE:
// this->AddReshapeNode(ir_graph, ir_node);
// break;
// case OP_SLICE:
// this->AddSliceNode(ir_graph, ir_node);
// break;
// case OP_SOFTMAX:
// this->AddSoftmaxNode(ir_graph, ir_node);
default:
fprintf(stderr, "Tengine TIM-VX: Cannot support OP(%d).\n", ir_node->idx);
break;
}
}
}
int VXEngine::VXEnginePreRun(struct subgraph* subgraph)
{
struct ir_graph* ir_graph = subgraph->graph;
/* Add TIM-VX Tensor */
TLOG_INFO("Log:subgraph->node_num %d\n", subgraph->node_num);
for (uint8_t i = 0; i < subgraph->output_num; i++)
{
int ir_tensor_idx = subgraph->output_tensor_list[i];
this->VXTensorMap(ir_graph, ir_tensor_idx, SPEC_TYPE_OUTPUT);
}
for (int i = 0; i < subgraph->node_num; i++)
{
uint16_t node_id = subgraph->node_list[i];
struct ir_node* ir_node = get_ir_graph_node(ir_graph, node_id);
if (ir_node->op.op_type == OP_CONV)
{
struct conv_param* conv_param = ( struct conv_param* )ir_node->op.param_mem;
if (conv_param->group == conv_param->output_channel)
{
TLOG_INFO("Log:#### 000 SPEC_TYPE_DWCONV\n");
this->VXTensorMap(ir_graph, ir_node->input_tensors[1], SPEC_TYPE_DWCONV);
}
}
}
for (int i = 0; i < subgraph->node_num; i++)
{
uint16_t node_id = subgraph->node_list[i];
struct ir_node* ir_node = get_ir_graph_node(ir_graph, node_id);
for (int j = 0; j < ir_node->input_num; j++)
{
int ir_tensor_idx = ir_node->input_tensors[j];
this->VXTensorMap(ir_graph, ir_tensor_idx, 0);
}
for (int j = 0; j < ir_node->output_num; j++)
{
int ir_tensor_idx = ir_node->output_tensors[j];
this->VXTensorMap(ir_graph, ir_tensor_idx, 0);
}
}
/* Add TIM-VX Node */
this->Build(subgraph);
// fprintf(stderr,"subgraph->node_num %d\n",subgraph->node_num);
if (subgraph->node_num > 0)
{
if (!this->graph->Compile()) {
std::cout << "\nCompile graph fail." << std::endl;
return -1;
}
}
return 0;
};
int VXEngine::VXEngineRun(struct subgraph* subgraph)
{
struct ir_graph* ir_graph = subgraph->graph;
/* upload data */
// fprintf(stderr,"subgraph->input_num %d\n",subgraph->input_num);
if (subgraph->input_num > 0)
{
for (uint8_t i = 0; i < subgraph->input_num; i++)
{
int ir_tensor_idx = subgraph->input_tensor_list[i];
struct ir_tensor* ir_tensor = get_ir_graph_tensor(ir_graph, ir_tensor_idx);
if (!this->vx_tensor_map[ir_tensor_idx]->CopyDataToTensor(ir_tensor->data, ir_tensor->elem_num * ir_tensor->elem_size)) {
std::cout << "Copy input data fail." << std::endl;
return -1;
}
}
if (!this->graph->Run()) {
std::cout << "Run graph fail." << std::endl;
return -1;
}
/* download data */
for (uint8_t i = 0; i < subgraph->output_num; i++)
{
int ir_tensor_idx = subgraph->output_tensor_list[i];
struct ir_tensor* ir_tensor = get_ir_graph_tensor(ir_graph, ir_tensor_idx);
if (ir_tensor->data == NULL)
{
TLOG_INFO("Log:download data is NULL\n");
uint8_t* u8data = (uint8_t*)malloc(ir_tensor->elem_size * ir_tensor->elem_num);
ir_tensor->data = u8data;
}
if (!this->vx_tensor_map[ir_tensor_idx]->CopyDataFromTensor(ir_tensor->data))
{
TLOG_INFO("Log:Copy input data fail\n");
return -1;
}
}
}
return 0;
}
void VXEngine::VXEnginePostRun()
{
};
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: lswang@openailab.com
*/
#ifndef __TIMVX_TIMVX_EXECUTOR_HPP__
#define __TIMVX_TIMVX_EXECUTOR_HPP__
extern "C"
{
#include "tengine_ir.h"
#include "tengine_log.h"
}
#include <map>
#include <algorithm>
#include <iomanip>
#include <iostream>
#include <tuple>
#include <vector>
#include "tim/vx/context.h"
#include "tim/vx/graph.h"
#include "tim/vx/operation.h"
#include "tim/vx/ops/activations.h"
#include "tim/vx/ops/concat.h"
#include "tim/vx/ops/conv2d.h"
#include "tim/vx/ops/elementwise.h"
#include "tim/vx/ops/fullyconnected.h"
#include "tim/vx/ops/pool2d.h"
#include "tim/vx/ops/reshape.h"
#include "tim/vx/ops/softmax.h"
#include "tim/vx/tensor.h"
#include "convolution_param.h"
#define SPEC_TYPE_OUTPUT 1
#define SPEC_TYPE_DWCONV 2
typedef std::map<uint32_t, std::shared_ptr<tim::vx::Tensor>> dict_irt2vxt;
class VXEngine
{
public:
VXEngine();
~VXEngine() = default;
int VXEnginePreRun(struct subgraph* subgraph);
int VXEngineRun(struct subgraph* subgraph);
void VXEnginePostRun();
private:
int Build(struct subgraph* subgraph);
void VXTensorMap(struct ir_graph* ir_graph, int ir_tensor_idx, int spec_type);
bool AddClipNode(struct ir_node* ir_node);
bool AddConcatNode(struct ir_node* ir_node);
bool AddConvolutionNode(struct ir_node* ir_node);
bool AddDropoutNode(struct ir_node* ir_node);
bool AddEltwisSumNode(struct ir_node* ir_node);
bool AddFlattenNode(struct ir_node* ir_node);
bool AddFullyConnectionNode(struct ir_node* node);
bool AddPoolingNode(struct ir_node* ir_node);
bool AddReluNode(struct ir_node* ir_node);
public:
std::shared_ptr<tim::vx::Context> context;
std::shared_ptr<tim::vx::Graph> graph;
std::shared_ptr<tim::vx::Operation> ops;
private:
dict_irt2vxt vx_tensor_map;
dict_irt2vxt vx_node_map;
};
#endif
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: lswang@openailab.com
*/
#include "timvx_graph.hpp"
#include "timvx_executor.hpp"
extern "C"
{
#include "nn_device.h"
}
int timvx_dev_init(struct nn_device* dev)
{
(void)dev;
return 0;
}
int timvx_dev_prerun(struct nn_device* dev, struct subgraph* subgraph, int num_thread, int cpu_affinity, int mode)
{
fprintf(stderr,"TIM-VX prerun.\n");
subgraph->exec_graph = new VXEngine;
auto engine = (VXEngine*)subgraph->exec_graph;
return engine->VXEnginePreRun(subgraph);
}
int timvx_dev_run(struct nn_device* dev, struct subgraph* subgraph)
{
auto engine = (VXEngine*)subgraph->exec_graph;
return engine->VXEngineRun(subgraph);
}
int timvx_dev_postrun(struct nn_device* dev, struct subgraph* subgraph)
{
auto engine = (VXEngine*)subgraph->exec_graph;
engine->VXEnginePostRun();
delete engine;
return 0;
}
int timvx_dev_release(struct nn_device* dev)
{
(void)dev;
return 0;
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: lswang@openailab.com
*/
#ifndef __TIMVX_TIMVX_GRAPH_HPP__
#define __TIMVX_TIMVX_GRAPH_HPP__
extern "C"
{
#include "nn_device.h"
#include "tengine_ir.h"
int timvx_dev_init(struct nn_device* dev);
int timvx_dev_prerun(struct nn_device* dev, struct subgraph* subgraph, int num_thread, int cpu_affinity, int mode);
int timvx_dev_run(struct nn_device* dev, struct subgraph* subgraph);
int timvx_dev_postrun(struct nn_device* dev, struct subgraph* subgraph);
int timvx_dev_release(struct nn_device* dev);
}
#endif
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: lswang@openailab.com
*/
#pragma once
#include <memory>
#include <sstream>
#include <ctime>
#include <iomanip>
#include <iostream>
#include <ostream>
#include <string>
#ifdef _MSC_VER
#define FN_NAME __FUNCTION__
#else
#define FN_NAME __func__
#endif
#if (!defined(__ANDROID__) && defined(__aarch64__)) || defined(__QNX__)
#define ENABLE_DLA_API 1
#endif
#define CHECK(status) \
do \
{ \
auto ret = (status); \
if (ret != 0) \
{ \
Log(Loglevel, "TensorRT Engine", "Cuda failure: %d", ret); \
abort(); \
} \
} while (0)
constexpr long double operator"" _GiB(long double val)
{
return val * (1 << 30);
}
constexpr long double operator"" _MiB(long double val) { return val * (1 << 20); }
constexpr long double operator"" _KiB(long double val) { return val * (1 << 10); }
// These is necessary if we want to be able to write 1_GiB instead of 1.0_GiB.
// Since the return type is signed, -1_GiB will work as expected.
constexpr long long int operator"" _GiB(long long unsigned int val) { return val * (1 << 30); }
constexpr long long int operator"" _MiB(long long unsigned int val) { return val * (1 << 20); }
constexpr long long int operator"" _KiB(long long unsigned int val) { return val * (1 << 10); }
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* License); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*
* Copyright (c) 2021, Open AI Lab
* Author: hhchen@openailab.com
*/
#pragma once
extern "C"
{
#include "tengine_op.h"
}
const int timvx_supported_ops[] = {
OP_CLIP,
OP_CONCAT,
OP_CONST,
OP_CONV,
OP_DROPOUT,
OP_ELTWISE,
OP_FC,
OP_FLATTEN,
OP_INPUT,
// OP_PERMUTE,
OP_POOL,
OP_RELU,
OP_RESHAPE,
OP_SLICE,
OP_SOFTMAX
};
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册