Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Xiaomi
Mace
提交
6782e5b3
Mace
项目概览
Xiaomi
/
Mace
通知
106
Star
40
Fork
27
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
Mace
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
6782e5b3
编写于
2月 26, 2018
作者:
L
liuqi
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Update opencl readme.
上级
52a04f77
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
48 addition
and
25 deletion
+48
-25
README.md
README.md
+0
-23
mace/kernels/opencl/REAEMD.md
mace/kernels/opencl/REAEMD.md
+48
-2
未找到文件。
README.md
浏览文件 @
6782e5b3
# **MACE** - *Mobile(Mi) Accelerated Compute Engine Library*
---
## Introduction
---
**Accelerating Neural Network with Heterogeneous Computing Devices in the phone.**
Supported Devices:
**CPU(NEON)/GPU/DSP**
.
## Architecture
---
-
Use computational pattern of
**DAG consisting of Ops**
.
-
**Tensor**
objects manage all data.
-
**Workspace**
manage all
**Tensors**
.
## GPU
---
Use
**Image**
object to optimize memory access and parallel computing based on OpenCL 2.0.
Design the corresponding
**Image**
format to optimize memory access for different Op algorithm.
Each pixel of
**Image**
object contains four elements(e.g. RGBA).
The Following is
**Buffer**
and
**Image**
format for all
**Tensors**
.
| Tensor| Buffer| Image| Explanation|
| --------- | :---------:|:--------:|:----:|
|Channel-Major Input/Output | NHWC | [W
* (C+3)/4, N *
H] | Default Input/Output format|
|Height-Major Input/Output | NHWC | [W
* C, N *
(H+3)/4] | Winograd Convolution format|
|Width-Major Input/Output | NHWC | [(W+3)/4
* C, N *
H] | Winograd Convolution format|
|Convolution Filter | HWOI | [H
* W *
RoundUp
<
4
>
(I), (O+3)/4]|Convolution filter format,There is no difference compared to [H
*w*
I, (O+3)/4]|
|Depthwise Convlution Filter | HWIM | [H
* W *
M, (I+3)/4]|Depthwise-Convolution filter format|
|1-D Argument | W | [(W+3)/4, 1] | 1D argument format, e.g. Bias|
\ No newline at end of file
mace/kernels/opencl/REAEMD.md
浏览文件 @
6782e5b3
OpenCL Image Storage Layout
===
Use
**Image**
object to optimize memory access and parallel computing based on OpenCL 2.0.
Design the corresponding
**Image**
format to optimize memory access for different Op algorithm.
Each pixel of
**Image**
object contains 4 elements(e.g. RGBA).
The Followings are the
**Buffer**
and
**Image**
format for all
**Tensors**
.
Input/Output
---
**Mace**
use NHWC format Input/Output.
| Tensor| Buffer| Image Size [Width, Height]| Explanation|
| --------- | :---------:|:--------:|:----:|
|Channel-Major Input/Output | NHWC | [W
* (C+3)/4, N *
H] | Default Input/Output format|
|Height-Major Input/Output | NHWC | [W
* C, N *
(H+3)/4] | Winograd Convolution format|
|Width-Major Input/Output | NHWC | [(W+3)/4
* C, N *
H] | Winograd Convolution format|
Each Pixel of
**Image**
contains 4 elements. The below table list the coordination relation
between
**Image**
and
**Buffer**
.
Conv2D Filter
| Tensor| Pixel Coordinate Relation| Explanation
| --------- | :---------:| :-----: |
|Channel-Major Input/Output | P[i, j] = {E[n, h, w, c]
|
(n=j/H, h=j%H, w=i%W, c=[i/W
*
4 + k])}| k=[0, 4)|
|Height-Major Input/Output | P[i, j] = {E[n, h, w, c]
|
(n=j%N, h=[j/H
*
4 + k], w=i%W, c=i/W)}| k=[0, 4)|
|Width-Major Input/Output | P[i, j] = {E[n, h, w, c]
|
(n=j/H, h=j%H, w=[i%W
*
4 + k], c=i/W)}| k=[0, 4)|
Filter
---
| Tensor| Buffer| Image Size [Width, Height]| Explanation|
| --------- | :---------:|:--------:|:----:|
|Convolution Filter | HWOI | [H
* W *
RoundUp
<
4
>
(I), (O+3)/4]|Convolution filter format,There is no difference compared to [H
*w*
I, (O+3)/4]|
|Depthwise Convlution Filter | HWIM | [H
* W *
M, (I+3)/4]|Depthwise-Convolution filter format|
Each Pixel of
**Image**
contains 4 elements. The below table list the coordination relation
between
**Image**
and
**Buffer**
.
Depthwise Conv2D Filter
| Tensor| Pixel Coordinate Relation| Explanation|
| --------- | :---------:| :-----:|
|Convolution Filter | P[m, n] = {E[h, w, o, i]
|
(h=T/W, w=T%W, o=[n
*4+k], i=m%RI)}| RI=((I + 3) / 4) *
4, T=m/RI, k=[0, 4)|
|Depthwise Convlution Filter | P[m, n] = {E[h, w, i, 0]
|
(h=m/W, w=m%W, i=[n
*
4+k])}| only support multiplier == 1, k=[0, 4)|
1-D Argument
---
| Tensor| Buffer| Image Size [Width, Height]| Explanation|
| --------- | :---------:|:--------:|:----:|
|1-D Argument | W | [(W+3)/4, 1] | 1D argument format, e.g. Bias|
Each Pixel of
**Image**
contains 4 elements. The below table list the coordination relation
between
**Image**
and
**Buffer**
.
| Tensor| Pixel Coordinate Relation| Explanation|
| --------- | :---------:| :-----:|
|1-D Argument | P[i, 0] = {E[w]
|
w=i
*
4+k}| k=[0, 4)|
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录