From 6782e5b3c75d78778ceac7ff806b3c1146fd3b78 Mon Sep 17 00:00:00 2001
From: liuqi <liuqi10@xiaomi.com>
Date: Mon, 26 Feb 2018 10:56:35 +0800
Subject: [PATCH] Update opencl readme.

---
 README.md                     | 23 ----------------
 mace/kernels/opencl/REAEMD.md | 50 +++++++++++++++++++++++++++++++++--
 2 files changed, 48 insertions(+), 25 deletions(-)

diff --git a/README.md b/README.md
index 27a05bda..3e36fe72 100644
--- a/README.md
+++ b/README.md
@@ -1,30 +1,7 @@
 # **MACE** - *Mobile(Mi) Accelerated Compute Engine Library*
----
 
 ## Introduction
----
 **Accelerating Neural Network with Heterogeneous Computing Devices in the phone.**
 
 Supported Devices: **CPU(NEON)/GPU/DSP**.
 
-## Architecture
----
-- Use computational pattern of **DAG consisting of Ops**. 
-- **Tensor** objects manage all data.
-- **Workspace** manage all **Tensors**.
-
-## GPU
----
-Use **Image** object to optimize memory access and parallel computing based on OpenCL 2.0.
-
-Design the corresponding **Image** format to optimize memory access for different Op algorithm.
-Each pixel of **Image** object contains four elements(e.g. RGBA).
-The Following is **Buffer** and **Image** format for all **Tensors**.
-| Tensor| Buffer| Image| Explanation|
-| --------- | :---------:|:--------:|:----:|
-|Channel-Major Input/Output | NHWC | [W * (C+3)/4, N * H] | Default Input/Output format|
-|Height-Major Input/Output | NHWC | [W * C, N * (H+3)/4] | Winograd Convolution format| 
-|Width-Major Input/Output | NHWC | [(W+3)/4 * C, N * H] | Winograd Convolution format|
-|Convolution Filter | HWOI | [H * W * RoundUp<4>(I), (O+3)/4]|Convolution filter format，There is no difference compared to [H*w*I, (O+3)/4]|
-|Depthwise Convlution Filter | HWIM | [H * W * M, (I+3)/4]|Depthwise-Convolution filter format|
-|1-D Argument | W | [(W+3)/4, 1] | 1D argument format, e.g. Bias|
\ No newline at end of file
diff --git a/mace/kernels/opencl/REAEMD.md b/mace/kernels/opencl/REAEMD.md
index 9546e21e..c6f42fd5 100644
--- a/mace/kernels/opencl/REAEMD.md
+++ b/mace/kernels/opencl/REAEMD.md
@@ -1,12 +1,58 @@
 OpenCL Image Storage Layout
 ===
+Use **Image** object to optimize memory access and parallel computing based on OpenCL 2.0.
+
+
+Design the corresponding **Image** format to optimize memory access for different Op algorithm.
+Each pixel of **Image** object contains 4 elements(e.g. RGBA).
+
+
+The Followings are the **Buffer** and **Image** format for all **Tensors**.
 
 Input/Output
 ---
+**Mace** use NHWC format Input/Output.
+
+| Tensor| Buffer| Image Size [Width, Height]| Explanation|
+| --------- | :---------:|:--------:|:----:|
+|Channel-Major Input/Output | NHWC | [W * (C+3)/4, N * H] | Default Input/Output format|
+|Height-Major Input/Output | NHWC | [W * C, N * (H+3)/4] | Winograd Convolution format| 
+|Width-Major Input/Output | NHWC | [(W+3)/4 * C, N * H] | Winograd Convolution format|
+
+Each Pixel of **Image** contains 4 elements. The below table list the coordination relation 
+between **Image** and **Buffer**.
 
-Conv2D Filter
+| Tensor| Pixel Coordinate Relation| Explanation
+| --------- | :---------:| :-----: |
+|Channel-Major Input/Output | P[i, j] = {E[n, h, w, c] &#124; (n=j/H, h=j%H, w=i%W, c=[i/W * 4 + k])}| k=[0, 4)|
+|Height-Major Input/Output | P[i, j] = {E[n, h, w, c] &#124; (n=j%N, h=[j/H*4 + k], w=i%W, c=i/W)}| k=[0, 4)|
+|Width-Major Input/Output | P[i, j] = {E[n, h, w, c] &#124; (n=j/H, h=j%H, w=[i%W*4 + k], c=i/W)}| k=[0, 4)|
+
+
+Filter
 ---
+| Tensor| Buffer| Image Size [Width, Height]| Explanation|
+| --------- | :---------:|:--------:|:----:|
+|Convolution Filter | HWOI | [H * W * RoundUp<4>(I), (O+3)/4]|Convolution filter format，There is no difference compared to [H*w*I, (O+3)/4]|
+|Depthwise Convlution Filter | HWIM | [H * W * M, (I+3)/4]|Depthwise-Convolution filter format|
+
+Each Pixel of **Image** contains 4 elements. The below table list the coordination relation 
+between **Image** and **Buffer**.
 
-Depthwise Conv2D Filter
+| Tensor| Pixel Coordinate Relation| Explanation|
+| --------- | :---------:| :-----:|
+|Convolution Filter | P[m, n] = {E[h, w, o, i] &#124; (h=T/W, w=T%W, o=[n*4+k], i=m%RI)}| RI=((I + 3) / 4) * 4, T=m/RI, k=[0, 4)|
+|Depthwise Convlution Filter | P[m, n] = {E[h, w, i, 0] &#124; (h=m/W, w=m%W, i=[n*4+k])}| only support multiplier == 1, k=[0, 4)| 
+
+1-D Argument
 ---
+| Tensor| Buffer| Image Size [Width, Height]| Explanation|
+| --------- | :---------:|:--------:|:----:|
+|1-D Argument | W | [(W+3)/4, 1] | 1D argument format, e.g. Bias|
+
+Each Pixel of **Image** contains 4 elements. The below table list the coordination relation 
+between **Image** and **Buffer**.
 
+| Tensor| Pixel Coordinate Relation| Explanation|
+| --------- | :---------:| :-----:|
+|1-D Argument | P[i, 0] = {E[w] &#124; w=i*4+k}| k=[0, 4)|
-- 
GitLab