[LITE][OPENCL] Enhance Profiler for OpenCL with in/out/filter shape, macs/macs_ps, real backend kernel etc. !3641
Created by: ysh329
状态:等待review
主要内容
profiler增强,包含不限于以下:
- 运行时和非运行时的Profiler信息;
- 运行时Profiler信息分为Op和Kernel两个维度。主要考虑减少代码量,不需要kernel层再次实现多次Op级的运行时Profiler信息;
- 拆分为Op级的Profiler信息包含不限于:
- 需要在
lite/operatres/
下面各个op的.h文件里,需要实现各自的SetProfileRuntimeOpInfo
方法,包括不限于`MACs / MACs/second / InputDim / OutputDim / FilterDim / Remark(如卷积的3x3s1p1g1d1这种)信息;
- 需要在
- 拆分为Kernel级的Profiler信息包含不限于:
- 真实的底层Kernel运行的执行名称,如OpenCL的conv1x1opt。需要各backends在各自的kernels下面添加完善
SetProfileRuntimeKernelInfo
方法。
- 真实的底层Kernel运行的执行名称,如OpenCL的conv1x1opt。需要各backends在各自的kernels下面添加完善
因为该过程会影响耗时,也是跳过第一次有个类似is_first_epoch的变量
跳过记录非时间的profiler信息。
profiler tf_mobilenetv1
repeats=1000, warmup=20
===== Detailed Dispatch Profiler Summary: N/A, Exclude 1 warm-ups =====
OperatorType KerneAttr KernelName Remark InDim FilterDim OutDim Avg(ms) Min(ms) Max(ms) Last(ms) Avg(%) GOPs GOPS clAvg(ms) clMin(ms) clMax(ms) clAvg(%)
io_copy opencl/any/any HostToOpenCL type0 1x3x224x224 N/A 1x3x224x224 0.565 0.090 1.698 0.692 4.64% 0.00 0.00 0.043 0.043 0.043 0.57%
layout opencl/any/ImageDefault buffer_to_image2d type0 1x3x224x224 N/A 1x3x224x224 0.371 0.055 0.946 0.354 3.05% 0.00 0.00 0.059 0.059 0.059 0.78%
conv2d opencl/float16/ImageDefault conv2d_3x3_opt 3x3p0s2g1d1 1x3x224x224 32x3x3x3 1x32x112x112 0.233 0.032 0.687 0.226 1.91% 0.02 93.12 0.260 0.260 0.260 3.44%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g32d1 1x32x112x112 32x1x3x3 1x32x112x112 1.836 0.526 2.590 1.764 15.09% 0.01 3.94 0.173 0.173 0.173 2.28%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x32x112x112 64x32x1x1 1x64x112x112 0.224 0.053 0.605 0.222 1.84% 0.05 229.52 0.349 0.349 0.349 4.61%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3 3x3p0s2g64d1 1x64x112x112 64x1x3x3 1x64x56x56 0.197 0.055 0.615 0.200 1.62% 0.00 18.36 0.160 0.160 0.160 2.11%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x64x56x56 128x64x1x1 1x128x56x56 0.212 0.059 0.569 0.208 1.74% 0.05 242.47 0.304 0.304 0.304 4.02%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g128d1 1x128x56x56 128x1x3x3 1x128x56x56 1.588 0.393 2.916 1.969 13.06% 0.01 4.55 0.179 0.179 0.179 2.36%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x128x56x56 128x128x1x1 1x128x56x56 0.215 0.055 0.584 0.217 1.77% 0.10 478.65 0.621 0.621 0.621 8.20%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3 3x3p0s2g128d1 1x128x56x56 128x1x3x3 1x128x28x28 0.196 0.055 0.594 0.204 1.61% 0.00 9.21 0.114 0.114 0.114 1.51%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x128x28x28 256x128x1x1 1x256x28x28 0.209 0.058 0.515 0.217 1.72% 0.05 245.61 0.268 0.268 0.268 3.54%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g256d1 1x256x28x28 256x1x3x3 1x256x28x28 0.197 0.073 0.457 0.203 1.62% 0.00 18.30 0.088 0.088 0.088 1.16%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x256x28x28 256x256x1x1 1x256x28x28 0.210 0.074 0.546 0.209 1.73% 0.10 489.57 0.528 0.528 0.528 6.98%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3 3x3p0s2g256d1 1x256x28x28 256x1x3x3 1x256x14x14 0.199 0.070 1.283 0.204 1.63% 0.00 4.54 0.091 0.091 0.091 1.20%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x256x14x14 512x256x1x1 1x512x14x14 0.212 0.073 0.628 0.227 1.74% 0.05 242.86 0.253 0.253 0.253 3.35%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g512d1 1x512x14x14 512x1x3x3 1x512x14x14 0.202 0.071 0.638 0.211 1.66% 0.00 8.96 0.048 0.048 0.048 0.64%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x512x14x14 512x512x1x1 1x512x14x14 0.213 0.075 0.544 0.221 1.75% 0.10 481.68 0.503 0.503 0.503 6.65%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g512d1 1x512x14x14 512x1x3x3 1x512x14x14 0.204 0.068 0.594 0.214 1.68% 0.00 8.86 0.049 0.049 0.049 0.65%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x512x14x14 512x512x1x1 1x512x14x14 0.222 0.075 0.793 0.224 1.82% 0.10 463.45 0.492 0.492 0.492 6.50%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g512d1 1x512x14x14 512x1x3x3 1x512x14x14 0.209 0.071 0.867 0.212 1.72% 0.00 8.63 0.048 0.048 0.048 0.64%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x512x14x14 512x512x1x1 1x512x14x14 0.223 0.077 0.733 0.281 1.84% 0.10 460.12 0.491 0.491 0.491 6.49%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g512d1 1x512x14x14 512x1x3x3 1x512x14x14 0.211 0.072 0.726 0.223 1.74% 0.00 8.54 0.048 0.048 0.048 0.64%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x512x14x14 512x512x1x1 1x512x14x14 0.222 0.074 0.672 0.222 1.82% 0.10 463.01 0.494 0.494 0.494 6.53%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g512d1 1x512x14x14 512x1x3x3 1x512x14x14 0.209 0.072 0.772 0.208 1.72% 0.00 8.64 0.049 0.049 0.049 0.65%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x512x14x14 512x512x1x1 1x512x14x14 0.222 0.081 0.862 0.220 1.83% 0.10 462.09 0.503 0.503 0.503 6.65%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3 3x3p0s2g512d1 1x512x14x14 512x1x3x3 1x512x7x7 0.208 0.077 2.002 0.210 1.71% 0.00 2.17 0.050 0.050 0.050 0.66%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x512x7x7 1024x512x1x1 1x1024x7x7 0.223 0.072 0.591 0.222 1.83% 0.05 230.34 0.257 0.257 0.257 3.40%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g1024d1 1x1024x7x7 1024x1x3x3 1x1024x7x7 0.211 0.071 0.552 0.209 1.74% 0.00 4.28 0.026 0.026 0.026 0.35%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x1024x7x7 1024x1024x1x1 1x1024x7x7 0.220 0.069 0.555 0.226 1.81% 0.10 467.46 0.530 0.530 0.530 7.00%
pool2d opencl/float16/ImageDefault pool_avg avg7x7s2p0VALID 1x1024x7x7 N/A 1x1024x1x1 0.246 0.081 0.838 0.245 2.03% 0.00 0.20 0.031 0.031 0.031 0.41%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x1024x1x1 1001x1024x1x1 1x1001x1x1 0.238 0.073 0.517 0.236 1.95% 0.00 8.63 0.338 0.338 0.338 4.47%
layout opencl/any/NCHW image2d_to_buffer type0 1x1001x1x1 N/A 1x1001x1x1 0.223 0.076 0.607 0.225 1.83% 0.00 0.00 0.008 0.008 0.008 0.11%
io_copy opencl/any/any OpenCLToHost type0 1x1001x1x1 N/A 1x1001x1x1 0.085 0.031 0.645 0.082 0.70% 0.00 0.00 0.023 0.023 0.023 0.30%
squeeze2 arm/float/NCHW NotImpl N/A 1x1001x1x1 N/A 1x1001 0.013 0.004 0.031 0.013 0.11% 0.00 0.00 0.000 0.000 0.000 0.00%
io_copy opencl/any/any HostToOpenCL type0 1x1001 N/A 1x1001 0.047 0.010 0.221 0.047 0.39% 0.00 0.00 0.004 0.004 0.004 0.05%
layout opencl/any/ImageDefault buffer_to_image2d type0 1x1001 N/A 1x1001 0.224 0.085 0.658 0.222 1.84% 0.00 0.00 0.008 0.008 0.008 0.11%
reshape2 opencl/float16/ImageDefault reshape N/A 1x1001 N/A 1x1001 0.259 0.079 0.633 0.262 2.13% 0.00 0.00 0.018 0.018 0.018 0.24%
layout opencl/any/NCHW image2d_to_buffer type0 1x1001 N/A 1x1001 0.203 0.072 0.720 0.240 1.67% 0.00 0.00 0.007 0.007 0.007 0.09%
io_copy opencl/any/any OpenCLToHost type0 1x1001 N/A 1x1001 0.079 0.027 0.719 0.080 0.65% 0.00 0.00 0.012 0.012 0.012 0.16%
softmax arm/float/NCHW NotImpl axis1 1x1001 N/A 1x1001 0.091 0.017 0.528 0.088 0.75% 0.00 0.07 0.000 0.000 0.000 0.00%
io_copy opencl/any/any HostToOpenCL type0 1x1001 N/A 1x1001 0.049 0.010 0.430 0.050 0.40% 0.00 0.00 0.004 0.004 0.004 0.05%
layout opencl/any/ImageDefault buffer_to_image2d type0 1x1001 N/A 1x1001 0.216 0.074 0.662 0.222 1.78% 0.00 0.00 0.006 0.006 0.006 0.08%
reshape2 opencl/float16/ImageDefault reshape N/A 1x1001 N/A 1x1001 0.248 0.082 1.097 0.243 2.04% 0.00 0.00 0.015 0.015 0.015 0.20%
layout opencl/any/NCHW image2d_to_buffer type0 1x1001 N/A 1x1001 0.200 0.067 0.576 0.200 1.64% 0.00 0.00 0.006 0.006 0.006 0.08%
io_copy opencl/any/any OpenCLToHost type0 1x1001 N/A 1x1001 0.079 0.028 0.652 0.077 0.65% 0.00 0.00 0.009 0.009 0.009 0.12%
profiler tf_mobilenetv2
repeats=1000, warmup=20
===== Detailed Dispatch Profiler Summary: N/A, Exclude 1 warm-ups =====
OperatorType KerneAttr KernelName Remark InDim FilterDim OutDim Avg(ms) Min(ms) Max(ms) Last(ms) Avg(%) GOPs GOPS clAvg(ms) clMin(ms) clMax(ms) clAvg(%)
io_copy opencl/any/any HostToOpenCL type0 1x3x224x224 N/A 1x3x224x224 0.536 0.094 1.200 0.561 2.46% 0.00 0.00 0.047 0.047 0.047 0.76%
layout opencl/any/ImageDefault buffer_to_image2d type0 1x3x224x224 N/A 1x3x224x224 0.361 0.059 1.053 0.369 1.65% 0.00 0.00 0.058 0.058 0.058 0.94%
conv2d opencl/float16/ImageDefault conv2d_3x3_opt 3x3p0s2g1d1 1x3x224x224 32x3x3x3 1x32x112x112 0.209 0.032 0.931 0.207 0.96% 0.02 103.94 0.259 0.259 0.259 4.18%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g32d1 1x32x112x112 32x1x3x3 1x32x112x112 0.933 0.269 1.744 0.648 4.27% 0.01 7.75 0.172 0.172 0.172 2.78%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x32x112x112 16x32x1x1 1x16x112x112 1.456 0.453 2.514 1.010 6.67% 0.01 8.82 0.118 0.118 0.118 1.90%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x16x112x112 96x16x1x1 1x96x112x112 0.176 0.053 0.470 0.121 0.81% 0.04 219.04 0.302 0.302 0.302 4.87%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3 3x3p0s2g96d1 1x96x112x112 96x1x3x3 1x96x56x56 0.902 0.272 1.492 0.638 4.13% 0.01 6.01 0.218 0.218 0.218 3.52%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x96x56x56 24x96x1x1 1x24x56x56 0.168 0.056 0.578 0.119 0.77% 0.01 85.83 0.103 0.103 0.103 1.66%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x24x56x56 144x24x1x1 1x144x56x56 0.167 0.059 1.270 0.114 0.76% 0.02 130.03 0.147 0.147 0.147 2.37%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g144d1 1x144x56x56 144x1x3x3 1x144x56x56 1.128 0.339 2.354 0.794 5.17% 0.01 7.20 0.194 0.194 0.194 3.13%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x144x56x56 24x144x1x1 1x24x56x56 0.166 0.057 0.457 0.111 0.76% 0.02 130.57 0.149 0.149 0.149 2.40%
elementwise_add opencl/float16/ImageDefault elementwise_add N/A N/A N/A N/A 0.689 0.301 1.651 0.594 3.16% 0.00 0.00 0.025 0.025 0.025 0.40%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x24x56x56 144x24x1x1 1x144x56x56 0.169 0.066 0.827 0.110 0.78% 0.02 128.08 0.145 0.145 0.145 2.34%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3 3x3p0s2g144d1 1x144x56x56 144x1x3x3 1x144x28x28 0.153 0.068 0.487 0.098 0.70% 0.00 13.28 0.125 0.125 0.125 2.02%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x144x28x28 32x144x1x1 1x32x28x28 0.164 0.069 0.473 0.107 0.75% 0.01 44.14 0.051 0.051 0.051 0.82%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x32x28x28 192x32x1x1 1x192x28x28 0.168 0.071 0.762 0.107 0.77% 0.01 57.31 0.068 0.068 0.068 1.10%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g192d1 1x192x28x28 192x1x3x3 1x192x28x28 0.157 0.067 0.442 0.100 0.72% 0.00 17.28 0.067 0.067 0.067 1.08%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x192x28x28 32x192x1x1 1x32x28x28 0.168 0.070 0.502 0.104 0.77% 0.01 57.30 0.060 0.060 0.060 0.97%
elementwise_add opencl/float16/ImageDefault elementwise_add N/A N/A N/A N/A 0.108 0.051 0.398 0.067 0.50% 0.00 0.00 0.007 0.007 0.007 0.11%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x32x28x28 192x32x1x1 1x192x28x28 0.170 0.070 0.663 0.103 0.78% 0.01 56.75 0.065 0.065 0.065 1.05%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g192d1 1x192x28x28 192x1x3x3 1x192x28x28 0.667 0.331 1.326 0.397 3.06% 0.00 4.06 0.069 0.069 0.069 1.11%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x192x28x28 32x192x1x1 1x32x28x28 0.186 0.073 0.717 0.109 0.85% 0.01 51.79 0.062 0.062 0.062 1.00%
elementwise_add opencl/float16/ImageDefault elementwise_add N/A N/A N/A N/A 0.125 0.051 0.446 0.068 0.57% 0.00 0.00 0.009 0.009 0.009 0.14%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x32x28x28 192x32x1x1 1x192x28x28 0.187 0.079 0.536 0.102 0.86% 0.01 51.43 0.069 0.069 0.069 1.11%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3 3x3p0s2g192d1 1x192x28x28 192x1x3x3 1x192x14x14 0.184 0.040 0.607 0.100 0.84% 0.00 3.69 0.039 0.039 0.039 0.63%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x192x14x14 64x192x1x1 1x64x14x14 0.195 0.035 0.490 0.104 0.89% 0.00 24.70 0.054 0.054 0.054 0.87%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x64x14x14 384x64x1x1 1x384x14x14 0.199 0.032 0.526 0.106 0.91% 0.01 48.41 0.057 0.057 0.057 0.93%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g384d1 1x384x14x14 384x1x3x3 1x384x14x14 1.401 0.732 2.149 0.781 6.42% 0.00 0.97 0.043 0.043 0.043 0.69%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x384x14x14 64x384x1x1 1x64x14x14 0.206 0.037 0.721 0.109 0.95% 0.01 46.70 0.100 0.100 0.100 1.62%
elementwise_add opencl/float16/ImageDefault elementwise_add N/A N/A N/A N/A 0.132 0.026 0.490 0.068 0.60% 0.00 0.00 0.007 0.007 0.007 0.11%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x64x14x14 384x64x1x1 1x384x14x14 0.199 0.033 0.605 0.103 0.91% 0.01 48.51 0.058 0.058 0.058 0.93%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g384d1 1x384x14x14 384x1x3x3 1x384x14x14 0.189 0.030 0.525 0.126 0.87% 0.00 7.17 0.042 0.042 0.042 0.68%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x384x14x14 64x384x1x1 1x64x14x14 0.201 0.033 0.607 0.139 0.92% 0.01 48.02 0.099 0.099 0.099 1.59%
elementwise_add opencl/float16/ImageDefault elementwise_add N/A N/A N/A N/A 0.127 0.032 0.543 0.087 0.58% 0.00 0.00 0.007 0.007 0.007 0.11%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x64x14x14 384x64x1x1 1x384x14x14 0.196 0.031 0.639 0.135 0.90% 0.01 49.12 0.058 0.058 0.058 0.94%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g384d1 1x384x14x14 384x1x3x3 1x384x14x14 0.191 0.030 0.628 0.135 0.87% 0.00 7.10 0.042 0.042 0.042 0.68%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x384x14x14 64x384x1x1 1x64x14x14 0.202 0.031 0.514 0.137 0.93% 0.01 47.69 0.099 0.099 0.099 1.60%
elementwise_add opencl/float16/ImageDefault elementwise_add N/A N/A N/A N/A 0.129 0.032 0.708 0.088 0.59% 0.00 0.00 0.007 0.007 0.007 0.11%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x64x14x14 384x64x1x1 1x384x14x14 0.200 0.037 0.633 0.136 0.92% 0.01 48.11 0.057 0.057 0.057 0.92%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g384d1 1x384x14x14 384x1x3x3 1x384x14x14 0.196 0.058 0.799 0.130 0.90% 0.00 6.91 0.042 0.042 0.042 0.68%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x384x14x14 96x384x1x1 1x96x14x14 0.207 0.029 0.532 0.137 0.95% 0.01 69.92 0.099 0.099 0.099 1.59%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x96x14x14 576x96x1x1 1x576x14x14 0.756 0.203 1.319 0.504 3.47% 0.02 28.66 0.105 0.105 0.105 1.69%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g576d1 1x576x14x14 576x1x3x3 1x576x14x14 0.198 0.031 0.532 0.135 0.91% 0.00 10.27 0.054 0.054 0.054 0.87%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x576x14x14 96x576x1x1 1x96x14x14 0.207 0.031 0.650 0.153 0.95% 0.02 104.54 0.147 0.147 0.147 2.37%
elementwise_add opencl/float16/ImageDefault elementwise_add N/A N/A N/A N/A 0.138 0.020 0.439 0.102 0.63% 0.00 0.00 0.008 0.008 0.008 0.13%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x96x14x14 576x96x1x1 1x576x14x14 0.205 0.030 0.460 0.150 0.94% 0.02 105.92 0.103 0.103 0.103 1.66%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g576d1 1x576x14x14 576x1x3x3 1x576x14x14 0.199 0.026 0.562 0.141 0.91% 0.00 10.22 0.051 0.051 0.051 0.82%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x576x14x14 96x576x1x1 1x96x14x14 0.210 0.027 0.578 0.151 0.96% 0.02 103.06 0.145 0.145 0.145 2.34%
elementwise_add opencl/float16/ImageDefault elementwise_add N/A N/A N/A N/A 0.138 0.019 0.474 0.099 0.63% 0.00 0.00 0.008 0.008 0.008 0.13%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x96x14x14 576x96x1x1 1x576x14x14 0.207 0.029 0.670 0.154 0.95% 0.02 104.74 0.104 0.104 0.104 1.68%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3 3x3p0s2g576d1 1x576x14x14 576x1x3x3 1x576x7x7 0.197 0.027 0.556 0.145 0.90% 0.00 2.58 0.055 0.055 0.055 0.88%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x576x7x7 160x576x1x1 1x160x7x7 0.209 0.068 0.645 0.151 0.96% 0.01 43.14 0.136 0.136 0.136 2.19%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x160x7x7 960x160x1x1 1x960x7x7 0.212 0.030 0.909 0.149 0.97% 0.02 70.97 0.086 0.086 0.086 1.39%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g960d1 1x960x7x7 960x1x3x3 1x960x7x7 0.199 0.068 0.554 0.142 0.91% 0.00 4.26 0.027 0.027 0.027 0.43%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x960x7x7 160x960x1x1 1x160x7x7 0.210 0.075 0.484 0.148 0.96% 0.02 71.73 0.222 0.222 0.222 3.58%
elementwise_add opencl/float16/ImageDefault elementwise_add N/A N/A N/A N/A 0.133 0.054 0.419 0.093 0.61% 0.00 0.00 0.007 0.007 0.007 0.11%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x160x7x7 960x160x1x1 1x960x7x7 0.214 0.076 0.944 0.154 0.98% 0.02 70.41 0.086 0.086 0.086 1.38%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g960d1 1x960x7x7 960x1x3x3 1x960x7x7 0.202 0.087 0.581 0.212 0.93% 0.00 4.19 0.027 0.027 0.027 0.44%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x960x7x7 160x960x1x1 1x160x7x7 0.214 0.095 0.549 0.219 0.98% 0.02 70.50 0.222 0.222 0.222 3.59%
elementwise_add opencl/float16/ImageDefault elementwise_add N/A N/A N/A N/A 0.136 0.057 0.546 0.140 0.62% 0.00 0.00 0.007 0.007 0.007 0.11%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x160x7x7 960x160x1x1 1x960x7x7 0.214 0.095 0.490 0.220 0.98% 0.02 70.35 0.086 0.086 0.086 1.39%
depthwise_conv2d opencl/float16/ImageDefault depth_conv2d_3x3s1 3x3p1s1g960d1 1x960x7x7 960x1x3x3 1x960x7x7 0.202 0.090 0.635 0.210 0.93% 0.00 4.19 0.027 0.027 0.027 0.43%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x960x7x7 320x960x1x1 1x320x7x7 0.214 0.083 0.543 0.218 0.98% 0.03 140.80 0.226 0.226 0.226 3.65%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x320x7x7 1280x320x1x1 1x1280x7x7 0.855 0.510 2.572 0.910 3.92% 0.04 46.93 0.228 0.228 0.228 3.68%
pool2d opencl/float16/ImageDefault pool_avg avg7x7s1p0VALID 1x1280x7x7 N/A 1x1280x1x1 0.252 0.138 0.854 0.262 1.15% 0.00 0.25 0.031 0.031 0.031 0.50%
conv2d opencl/float16/ImageDefault conv2d_1x1_simple 1x1p0s1g1d1 1x1280x1x1 1001x1280x1x1 1x1001x1x1 0.224 0.127 0.566 0.226 1.02% 0.00 11.46 0.383 0.383 0.383 6.18%
layout opencl/any/NCHW image2d_to_buffer type0 1x1001x1x1 N/A 1x1001x1x1 0.226 0.151 0.863 0.236 1.03% 0.00 0.00 0.008 0.008 0.008 0.13%
io_copy opencl/any/any OpenCLToHost type0 1x1001x1x1 N/A 1x1001x1x1 0.085 0.056 0.307 0.089 0.39% 0.00 0.00 0.020 0.020 0.020 0.32%
squeeze2 arm/float/NCHW NotImpl N/A 1x1001x1x1 N/A 1x1001 0.013 0.007 0.461 0.014 0.06% 0.00 0.00 0.000 0.000 0.000 0.00%
io_copy opencl/any/any HostToOpenCL type0 1x1001 N/A 1x1001 0.047 0.019 0.326 0.049 0.22% 0.00 0.00 0.003 0.003 0.003 0.05%
layout opencl/any/ImageDefault buffer_to_image2d type0 1x1001 N/A 1x1001 0.220 0.144 0.541 0.238 1.01% 0.00 0.00 0.008 0.008 0.008 0.13%
reshape2 opencl/float16/ImageDefault reshape N/A 1x1001 N/A 1x1001 0.263 0.162 0.610 0.281 1.21% 0.00 0.00 0.018 0.018 0.018 0.29%
layout opencl/any/NCHW image2d_to_buffer type0 1x1001 N/A 1x1001 0.196 0.124 0.514 0.207 0.90% 0.00 0.00 0.007 0.007 0.007 0.12%
io_copy opencl/any/any OpenCLToHost type0 1x1001 N/A 1x1001 0.076 0.047 0.338 0.083 0.35% 0.00 0.00 0.013 0.013 0.013 0.21%
softmax arm/float/NCHW NotImpl axis1 1x1001 N/A 1x1001 0.090 0.032 0.369 0.091 0.41% 0.00 0.07 0.000 0.000 0.000 0.00%
io_copy opencl/any/any HostToOpenCL type0 1x1001 N/A 1x1001 0.050 0.021 0.228 0.052 0.23% 0.00 0.00 0.003 0.003 0.003 0.05%
layout opencl/any/ImageDefault buffer_to_image2d type0 1x1001 N/A 1x1001 0.216 0.148 0.698 0.237 0.99% 0.00 0.00 0.007 0.007 0.007 0.11%
reshape2 opencl/float16/ImageDefault reshape N/A 1x1001 N/A 1x1001 0.244 0.161 0.705 0.252 1.12% 0.00 0.00 0.015 0.015 0.015 0.24%
layout opencl/any/NCHW image2d_to_buffer type0 1x1001 N/A 1x1001 0.203 0.109 0.763 0.206 0.93% 0.00 0.00 0.007 0.007 0.007 0.11%
io_copy opencl/any/any OpenCLToHost type0 1x1001 N/A 1x1001 0.079 0.041 0.537 0.084 0.36% 0.00 0.00 0.009 0.009 0.009 0.15%