From 6782e5b3c75d78778ceac7ff806b3c1146fd3b78 Mon Sep 17 00:00:00 2001 From: liuqi Date: Mon, 26 Feb 2018 10:56:35 +0800 Subject: [PATCH] Update opencl readme. --- README.md | 23 ---------------- mace/kernels/opencl/REAEMD.md | 50 +++++++++++++++++++++++++++++++++-- 2 files changed, 48 insertions(+), 25 deletions(-) diff --git a/README.md b/README.md index 27a05bda..3e36fe72 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,7 @@ # **MACE** - *Mobile(Mi) Accelerated Compute Engine Library* ---- ## Introduction ---- **Accelerating Neural Network with Heterogeneous Computing Devices in the phone.** Supported Devices: **CPU(NEON)/GPU/DSP**. -## Architecture ---- -- Use computational pattern of **DAG consisting of Ops**. -- **Tensor** objects manage all data. -- **Workspace** manage all **Tensors**. - -## GPU ---- -Use **Image** object to optimize memory access and parallel computing based on OpenCL 2.0. - -Design the corresponding **Image** format to optimize memory access for different Op algorithm. -Each pixel of **Image** object contains four elements(e.g. RGBA). -The Following is **Buffer** and **Image** format for all **Tensors**. -| Tensor| Buffer| Image| Explanation| -| --------- | :---------:|:--------:|:----:| -|Channel-Major Input/Output | NHWC | [W * (C+3)/4, N * H] | Default Input/Output format| -|Height-Major Input/Output | NHWC | [W * C, N * (H+3)/4] | Winograd Convolution format| -|Width-Major Input/Output | NHWC | [(W+3)/4 * C, N * H] | Winograd Convolution format| -|Convolution Filter | HWOI | [H * W * RoundUp<4>(I), (O+3)/4]|Convolution filter format,There is no difference compared to [H*w*I, (O+3)/4]| -|Depthwise Convlution Filter | HWIM | [H * W * M, (I+3)/4]|Depthwise-Convolution filter format| -|1-D Argument | W | [(W+3)/4, 1] | 1D argument format, e.g. Bias| \ No newline at end of file diff --git a/mace/kernels/opencl/REAEMD.md b/mace/kernels/opencl/REAEMD.md index 9546e21e..c6f42fd5 100644 --- a/mace/kernels/opencl/REAEMD.md +++ b/mace/kernels/opencl/REAEMD.md @@ -1,12 +1,58 @@ OpenCL Image Storage Layout === +Use **Image** object to optimize memory access and parallel computing based on OpenCL 2.0. + + +Design the corresponding **Image** format to optimize memory access for different Op algorithm. +Each pixel of **Image** object contains 4 elements(e.g. RGBA). + + +The Followings are the **Buffer** and **Image** format for all **Tensors**. Input/Output --- +**Mace** use NHWC format Input/Output. + +| Tensor| Buffer| Image Size [Width, Height]| Explanation| +| --------- | :---------:|:--------:|:----:| +|Channel-Major Input/Output | NHWC | [W * (C+3)/4, N * H] | Default Input/Output format| +|Height-Major Input/Output | NHWC | [W * C, N * (H+3)/4] | Winograd Convolution format| +|Width-Major Input/Output | NHWC | [(W+3)/4 * C, N * H] | Winograd Convolution format| + +Each Pixel of **Image** contains 4 elements. The below table list the coordination relation +between **Image** and **Buffer**. -Conv2D Filter +| Tensor| Pixel Coordinate Relation| Explanation +| --------- | :---------:| :-----: | +|Channel-Major Input/Output | P[i, j] = {E[n, h, w, c] | (n=j/H, h=j%H, w=i%W, c=[i/W * 4 + k])}| k=[0, 4)| +|Height-Major Input/Output | P[i, j] = {E[n, h, w, c] | (n=j%N, h=[j/H*4 + k], w=i%W, c=i/W)}| k=[0, 4)| +|Width-Major Input/Output | P[i, j] = {E[n, h, w, c] | (n=j/H, h=j%H, w=[i%W*4 + k], c=i/W)}| k=[0, 4)| + + +Filter --- +| Tensor| Buffer| Image Size [Width, Height]| Explanation| +| --------- | :---------:|:--------:|:----:| +|Convolution Filter | HWOI | [H * W * RoundUp<4>(I), (O+3)/4]|Convolution filter format,There is no difference compared to [H*w*I, (O+3)/4]| +|Depthwise Convlution Filter | HWIM | [H * W * M, (I+3)/4]|Depthwise-Convolution filter format| + +Each Pixel of **Image** contains 4 elements. The below table list the coordination relation +between **Image** and **Buffer**. -Depthwise Conv2D Filter +| Tensor| Pixel Coordinate Relation| Explanation| +| --------- | :---------:| :-----:| +|Convolution Filter | P[m, n] = {E[h, w, o, i] | (h=T/W, w=T%W, o=[n*4+k], i=m%RI)}| RI=((I + 3) / 4) * 4, T=m/RI, k=[0, 4)| +|Depthwise Convlution Filter | P[m, n] = {E[h, w, i, 0] | (h=m/W, w=m%W, i=[n*4+k])}| only support multiplier == 1, k=[0, 4)| + +1-D Argument --- +| Tensor| Buffer| Image Size [Width, Height]| Explanation| +| --------- | :---------:|:--------:|:----:| +|1-D Argument | W | [(W+3)/4, 1] | 1D argument format, e.g. Bias| + +Each Pixel of **Image** contains 4 elements. The below table list the coordination relation +between **Image** and **Buffer**. +| Tensor| Pixel Coordinate Relation| Explanation| +| --------- | :---------:| :-----:| +|1-D Argument | P[i, 0] = {E[w] | w=i*4+k}| k=[0, 4)| -- GitLab