memory_layout.rst 3.5 KB
Newer Older
L
Liangliang He 已提交
1
Memory layout
L
liutuo 已提交
2
==============
L
Liangliang He 已提交
3 4

CPU runtime memory layout
L
liutuo 已提交
5
--------------------------
L
Liangliang He 已提交
6 7
The CPU tensor buffer is organized in the following order:

L
Liangliang He 已提交
8 9 10 11 12 13 14 15 16 17 18 19 20
.. list-table::
    :header-rows: 1

    * - Tensor type
      - Buffer
    * - Intermediate input/output
      - NCHW
    * - Convolution Filter
      - OIHW
    * - Depthwise Convolution Filter
      - MIHW
    * - 1-D Argument, length = W
      - W
L
Liangliang He 已提交
21

L
liuqi 已提交
22
GPU runtime memory layout
L
liutuo 已提交
23
--------------------------
L
liuqi 已提交
24 25
GPU runtime implementation base on OpenCL, which uses 2D image with CL_RGBA
channel order as the tensor storage. This requires OpenCL 1.2 and above.
L
Liangliang He 已提交
26 27 28 29 30 31 32 33 34

The way of mapping the Tensor data to OpenCL 2D image (RGBA) is critical for
kernel performance.

In CL_RGBA channel order, each 2D image pixel contains 4 data items.
The following tables describe the mapping from different type of tensors to
2D RGBA Image.

Input/Output Tensor
L
liutuo 已提交
35
~~~~~~~~~~~~~~~~~~~~
L
Liangliang He 已提交
36 37 38

The Input/Output Tensor is stored in NHWC format:

L
Liangliang He 已提交
39 40 41 42 43 44 45 46 47 48 49 50 51
.. list-table::
    :header-rows: 1

    * - Tensor type
      - Buffer
      - Image size [width, height]
      - Explanation
    * - Channel-Major Input/Output
      - NHWC
      - [W * (C+3)/4, N * H]
      - Default Input/Output format
    * - Height-Major Input/Output
      - NHWC
L
liuqi 已提交
52
      - [W * C, N * (H+3)/4]
53
      - WinogradTransform and MatMul output format
L
Liangliang He 已提交
54 55 56
    * - Width-Major Input/Output
      - NHWC
      - [(W+3)/4 * C, N * H]
57
      - Unused now
L
Liangliang He 已提交
58 59 60 61

Each Pixel of **Image** contains 4 elements. The below table list the
coordination relation between **Image** and **Buffer**.

L
Liangliang He 已提交
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
.. list-table::
    :header-rows: 1

    * - Tensor type
      - Pixel coordinate relationship
      - Explanation
    * - Channel-Major Input/Output
      - P[i, j] = {E[n, h, w, c] | (n=j/H, h=j%H, w=i%W, c=[i/W * 4 + k])}
      - k=[0, 4)
    * - Height-Major Input/Output
      - P[i, j] = {E[n, h, w, c] | (n=j%N, h=[j/H*4 + k], w=i%W, c=i/W)}
      - k=[0, 4)
    * - Width-Major Input/Output
      - P[i, j] = {E[n, h, w, c] | (n=j/H, h=j%H, w=[i%W*4 + k], c=i/W)}
      - k=[0, 4)
L
Liangliang He 已提交
77 78

Filter Tensor
L
liutuo 已提交
79
~~~~~~~~~~~~~~
L
Liangliang He 已提交
80

L
Liangliang He 已提交
81 82 83 84 85 86 87 88
.. list-table::
    :header-rows: 1

    * - Tensor
      - Buffer
      - Image size [width, height]
      - Explanation
    * - Convolution Filter
L
liuqi 已提交
89 90 91
      - OIHW
      - [I, (O+3)/4 * W * H]
      - Convolution filter format,There is no difference compared to [H*W*I, (O+3)/4]
L
Liangliang He 已提交
92
    * - Depthwise Convlution Filter
L
liuqi 已提交
93
      - MIHW
L
Liangliang He 已提交
94 95
      - [H * W * M, (I+3)/4]
      - Depthwise-Convolution filter format
L
Liangliang He 已提交
96 97 98 99

Each Pixel of **Image** contains 4 elements. The below table list the
coordination relation between **Image** and **Buffer**.

L
Liangliang He 已提交
100 101 102 103 104 105 106
.. list-table::
    :header-rows: 1

    * - Tensor type
      - Pixel coordinate relationship
      - Explanation
    * - Convolution Filter
L
liuqi 已提交
107
      - P[m, n] = {E[o, i, h, w] | (o=[n/HW*4+k], i=m, h=T/W, w=T%W)}
L
Liangliang He 已提交
108 109
      - HW= H * W, T=n%HW, k=[0, 4)
    * - Depthwise Convlution Filter
L
liuqi 已提交
110
      - P[m, n] = {E[0, i, h, w] | (i=[n*4+k], h=m/W, w=m%W)}
L
Liangliang He 已提交
111
      - only support multiplier == 1, k=[0, 4)
L
Liangliang He 已提交
112 113

1-D Argument Tensor
L
liutuo 已提交
114
~~~~~~~~~~~~~~~~~~~~
L
Liangliang He 已提交
115

L
Liangliang He 已提交
116 117 118 119 120 121 122 123 124 125 126
.. list-table::
    :header-rows: 1

    * - Tensor type
      - Buffer
      - Image size [width, height]
      - Explanation
    * - 1-D Argument
      - W
      - [(W+3)/4, 1]
      - 1D argument format, e.g. Bias
L
Liangliang He 已提交
127 128 129 130

Each Pixel of **Image** contains 4 elements. The below table list the
coordination relation between **Image** and **Buffer**.

L
Liangliang He 已提交
131 132 133 134 135 136 137 138 139 140
.. list-table::
    :header-rows: 1

    * - Tensor type
      - Pixel coordinate relationship
      - Explanation
    * - 1-D Argument
      - P[i, 0] = {E[w] | w=i*4+k}
      - k=[0, 4)