......@@ -95,3 +95,7 @@ Add test and benchmark
It's strongly recommended to add unit test and micro benchmark for your
new Op. If you wish to contribute back, it's required.
Document the new Op
Finally, add an entry in operator table in the document.
......@@ -5,17 +5,21 @@ CPU runtime memory layout
The CPU tensor buffer is organized in the following order:
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Buffer
* - Intermediate input/output
* - Convolution Filter
* - Depthwise Convolution Filter
* - 1-D Argument, length = W
- W
OpenCL runtime memory layout
......@@ -34,66 +38,117 @@ Input/Output Tensor
The Input/Output Tensor is stored in NHWC format:
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Buffer
- Image size [width, height]
- Explanation
* - Channel-Major Input/Output
- [W * (C+3)/4, N * H]
- Default Input/Output format
* - Height-Major Input/Output
- [W * C, N * (H+3)/4
- Winograd Convolution format
* - Width-Major Input/Output
- [(W+3)/4 * C, N * H]
- Winograd Convolution format
Each Pixel of **Image** contains 4 elements. The below table list the
coordination relation between **Image** and **Buffer**.
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Pixel coordinate relationship
- Explanation
* - Channel-Major Input/Output
- P[i, j] = {E[n, h, w, c] | (n=j/H, h=j%H, w=i%W, c=[i/W * 4 + k])}
- k=[0, 4)
* - Height-Major Input/Output
- P[i, j] = {E[n, h, w, c] | (n=j%N, h=[j/H*4 + k], w=i%W, c=i/W)}
- k=[0, 4)
* - Width-Major Input/Output
- P[i, j] = {E[n, h, w, c] | (n=j/H, h=j%H, w=[i%W*4 + k], c=i/W)}
- k=[0, 4)
Filter Tensor
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor
- Buffer
- Image size [width, height]
- Explanation
* - Convolution Filter
- [RoundUp<4>(I), H * W * (O+3)/4]
- Convolution filter format,There is no difference compared to [H*w*I, (O+3)/4]
* - Depthwise Convlution Filter
- [H * W * M, (I+3)/4]
- Depthwise-Convolution filter format
Each Pixel of **Image** contains 4 elements. The below table list the
coordination relation between **Image** and **Buffer**.
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Pixel coordinate relationship
- Explanation
* - Convolution Filter
- P[m, n] = {E[h, w, o, i] | (h=T/W, w=T%W, o=[n/HW*4+k], i=m)}
- HW= H * W, T=n%HW, k=[0, 4)
* - Depthwise Convlution Filter
- P[m, n] = {E[h, w, i, 0] | (h=m/W, w=m%W, i=[n*4+k])}
- only support multiplier == 1, k=[0, 4)
1-D Argument Tensor
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Buffer
- Image size [width, height]
- Explanation
* - 1-D Argument
- W
- [(W+3)/4, 1]
- 1D argument format, e.g. Bias
Each Pixel of **Image** contains 4 elements. The below table list the
coordination relation between **Image** and **Buffer**.
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Pixel coordinate relationship
- Explanation
* - 1-D Argument
- P[i, 0] = {E[w] | w=i*4+k}
- k=[0, 4)
......@@ -19,46 +19,50 @@ Here is an deployment file example used by Android demo application.
TODO: change this example file to the demo deployment file
(reuse the same file) and rename to a reasonable name.
.. literalinclude :: models/demo_app_models.yaml
.. literalinclude:: models/demo_app_models.yaml
:language: yaml
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Configuration key
- Description
* - target_abis
- The target ABI to build, can be one or more of 'host', 'armeabi-v7a' or 'arm64-v8a'
* - embed_model_data
- Whether embedding model weights as the code, default to 1
* - platform
- The source framework, tensorflow or caffe
* - model_file_path
- The path of the model file, can be local or remote
* - weight_file_path
- The path of the model weights file, used by Caffe model
* - model_sha256_checksum
- The SHA256 checksum of the model file
* - weight_sha256_checksum
- The SHA256 checksum of the weight file, used by Caffe model
* - input_nodes
- The input node names, one or more strings
* - output_nodes
- The output node names, one or more strings
* - input_shapes
- The shapes of the input nodes, in NHWC order
* - output_shapes
- The shapes of the output nodes, in NHWC order
* - runtime
- The running device, one of CPU, GPU or DSP
* - limit_opencl_kernel_time
- Whether splitting the OpenCL kernel within 1 ms to keep UI responsiveness, default to 0
* - dsp_mode
- Control the DSP precision and performance, default to 0 usually works for most cases
* - obfuscate
- Whether to obfuscate the model operator name, default to 0
* - fast_conv
- Whether to enable Winograd convolution, **will increase memory consumption**
* - input_files
- Specify Numpy validation inputs. When not provided, [-1, 1] random values will be used
Operator lists
.. Please keep in chronological order when editing
.. csv-table::
:widths: auto
:header: "Operator","Android NN","Supported","Remark"
"BATCH_NORM","","Y","Fusion with activation is supported"
"CONV_2D","Y","Y","Fusion with BN and activation layer is supported"
"DEPTHWISE_CONV_2D","Y","Y","Only multiplier = 1 is supported; Fusion is supported"
"RESHAPE","Y","Y","Limited support"
