@@ -28,6 +28,51 @@ The goal of float16 is to serve as a key for the executor to find and run the co
...
@@ -28,6 +28,51 @@ The goal of float16 is to serve as a key for the executor to find and run the co
- [Eigen](https://github.com/RLovelett/eigen) >= 3.3 supports float16 calculation on both GPU and CPU using the `Eigen::half` class. It is mostly useful for Nvidia GPUs because of the overloaded arithmetic operators using cuda intrinsics. It falls back to using software emulation on CPU for calculation and there is no special treatment to ARM processors.
- [Eigen](https://github.com/RLovelett/eigen) >= 3.3 supports float16 calculation on both GPU and CPU using the `Eigen::half` class. It is mostly useful for Nvidia GPUs because of the overloaded arithmetic operators using cuda intrinsics. It falls back to using software emulation on CPU for calculation and there is no special treatment to ARM processors.
There are currently three versions of CUDA that supports `__half` data type, namely, CUDA 7.5, 8.0, and 9.0.
CUDA 7.5 and 8.0 define `__half` as a simple struct that has a `uint16_t` data (see [`cuda_fp16.h`](https://github.com/ptillet/isaac/blob/9212ab5a3ddbe48f30ef373f9c1fb546804c7a8c/include/isaac/external/CUDA/cuda_fp16.h)) as follows:
```
typedef struct __align__(2) {
unsigned short x;
} __half;
typedef __half half;
```
This struct does not define any overloaded arithmetic operators. So you have to directly use `__hadd` instead of `+` to correctly add two half types:
```
__global__ void Add() {
half a, b, c;
c = __hadd(a, b); // correct
c = a + b; // compiler error: no operator "+" matches these operands
}
```
CUDA 9.0 provides a major update to the half data type. The related code can be found in the updated [`cuda_fp16.h`](https://github.com/ptillet/isaac/blob/master/include/isaac/external/CUDA/cuda_fp16.h) and the newly added [`cuda_fp16.hpp`](https://github.com/ptillet/isaac/blob/master/include/isaac/external/CUDA/cuda_fp16.hpp).
Essentially, CUDA 9.0 renames the original `__half` type in 7.5 and 8.0 as `__half_raw`, and defines a new `__half` class type that has constructors, conversion operators, and also provides overloaded arithmetic operators such as follows:
<spanid="cuda-version-issue"></span><h3>CUDA version issue<aclass="headerlink"href="#cuda-version-issue"title="Permalink to this headline">¶</a></h3>
<p>There are currently three versions of CUDA that supports <codeclass="docutils literal"><spanclass="pre">__half</span></code> data type, namely, CUDA 7.5, 8.0, and 9.0.
CUDA 7.5 and 8.0 define <codeclass="docutils literal"><spanclass="pre">__half</span></code> as a simple struct that has a <codeclass="docutils literal"><spanclass="pre">uint16_t</span></code> data (see <aclass="reference external"href="https://github.com/ptillet/isaac/blob/9212ab5a3ddbe48f30ef373f9c1fb546804c7a8c/include/isaac/external/CUDA/cuda_fp16.h"><codeclass="docutils literal"><spanclass="pre">cuda_fp16.h</span></code></a>) as follows:</p>
<p>This struct does not define any overloaded arithmetic operators. So you have to directly use <codeclass="docutils literal"><spanclass="pre">__hadd</span></code> instead of <codeclass="docutils literal"><spanclass="pre">+</span></code> to correctly add two half types:</p>
<p>CUDA 9.0 provides a major update to the half data type. The related code can be found in the updated <aclass="reference external"href="https://github.com/ptillet/isaac/blob/master/include/isaac/external/CUDA/cuda_fp16.h"><codeclass="docutils literal"><spanclass="pre">cuda_fp16.h</span></code></a> and the newly added <aclass="reference external"href="https://github.com/ptillet/isaac/blob/master/include/isaac/external/CUDA/cuda_fp16.hpp"><codeclass="docutils literal"><spanclass="pre">cuda_fp16.hpp</span></code></a>.</p>
<p>Essentially, CUDA 9.0 renames the original <codeclass="docutils literal"><spanclass="pre">__half</span></code> type in 7.5 and 8.0 as <codeclass="docutils literal"><spanclass="pre">__half_raw</span></code>, and defines a new <codeclass="docutils literal"><spanclass="pre">__half</span></code> class type that has constructors, conversion operators, and also provides overloaded arithmetic operators such as follows:</p>
<p>This new design makes <codeclass="docutils literal"><spanclass="pre">c</span><spanclass="pre">=</span><spanclass="pre">a</span><spanclass="pre">+</span><spanclass="pre">b</span></code> work correctly for CUDA half data type.</p>
</div>
</div>
</div>
<divclass="section"id="implementation">
<divclass="section"id="implementation">
<spanid="implementation"></span><h2>Implementation<aclass="headerlink"href="#implementation"title="Permalink to this headline">¶</a></h2>
<spanid="implementation"></span><h2>Implementation<aclass="headerlink"href="#implementation"title="Permalink to this headline">¶</a></h2>
@@ -28,6 +28,51 @@ The goal of float16 is to serve as a key for the executor to find and run the co
...
@@ -28,6 +28,51 @@ The goal of float16 is to serve as a key for the executor to find and run the co
- [Eigen](https://github.com/RLovelett/eigen) >= 3.3 supports float16 calculation on both GPU and CPU using the `Eigen::half` class. It is mostly useful for Nvidia GPUs because of the overloaded arithmetic operators using cuda intrinsics. It falls back to using software emulation on CPU for calculation and there is no special treatment to ARM processors.
- [Eigen](https://github.com/RLovelett/eigen) >= 3.3 supports float16 calculation on both GPU and CPU using the `Eigen::half` class. It is mostly useful for Nvidia GPUs because of the overloaded arithmetic operators using cuda intrinsics. It falls back to using software emulation on CPU for calculation and there is no special treatment to ARM processors.
There are currently three versions of CUDA that supports `__half` data type, namely, CUDA 7.5, 8.0, and 9.0.
CUDA 7.5 and 8.0 define `__half` as a simple struct that has a `uint16_t` data (see [`cuda_fp16.h`](https://github.com/ptillet/isaac/blob/9212ab5a3ddbe48f30ef373f9c1fb546804c7a8c/include/isaac/external/CUDA/cuda_fp16.h)) as follows:
```
typedef struct __align__(2) {
unsigned short x;
} __half;
typedef __half half;
```
This struct does not define any overloaded arithmetic operators. So you have to directly use `__hadd` instead of `+` to correctly add two half types:
```
__global__ void Add() {
half a, b, c;
c = __hadd(a, b); // correct
c = a + b; // compiler error: no operator "+" matches these operands
}
```
CUDA 9.0 provides a major update to the half data type. The related code can be found in the updated [`cuda_fp16.h`](https://github.com/ptillet/isaac/blob/master/include/isaac/external/CUDA/cuda_fp16.h) and the newly added [`cuda_fp16.hpp`](https://github.com/ptillet/isaac/blob/master/include/isaac/external/CUDA/cuda_fp16.hpp).
Essentially, CUDA 9.0 renames the original `__half` type in 7.5 and 8.0 as `__half_raw`, and defines a new `__half` class type that has constructors, conversion operators, and also provides overloaded arithmetic operators such as follows:
<spanid="cuda-version-issue"></span><h3>CUDA version issue<aclass="headerlink"href="#cuda-version-issue"title="永久链接至标题">¶</a></h3>
<p>There are currently three versions of CUDA that supports <codeclass="docutils literal"><spanclass="pre">__half</span></code> data type, namely, CUDA 7.5, 8.0, and 9.0.
CUDA 7.5 and 8.0 define <codeclass="docutils literal"><spanclass="pre">__half</span></code> as a simple struct that has a <codeclass="docutils literal"><spanclass="pre">uint16_t</span></code> data (see <aclass="reference external"href="https://github.com/ptillet/isaac/blob/9212ab5a3ddbe48f30ef373f9c1fb546804c7a8c/include/isaac/external/CUDA/cuda_fp16.h"><codeclass="docutils literal"><spanclass="pre">cuda_fp16.h</span></code></a>) as follows:</p>
<p>This struct does not define any overloaded arithmetic operators. So you have to directly use <codeclass="docutils literal"><spanclass="pre">__hadd</span></code> instead of <codeclass="docutils literal"><spanclass="pre">+</span></code> to correctly add two half types:</p>
<p>CUDA 9.0 provides a major update to the half data type. The related code can be found in the updated <aclass="reference external"href="https://github.com/ptillet/isaac/blob/master/include/isaac/external/CUDA/cuda_fp16.h"><codeclass="docutils literal"><spanclass="pre">cuda_fp16.h</span></code></a> and the newly added <aclass="reference external"href="https://github.com/ptillet/isaac/blob/master/include/isaac/external/CUDA/cuda_fp16.hpp"><codeclass="docutils literal"><spanclass="pre">cuda_fp16.hpp</span></code></a>.</p>
<p>Essentially, CUDA 9.0 renames the original <codeclass="docutils literal"><spanclass="pre">__half</span></code> type in 7.5 and 8.0 as <codeclass="docutils literal"><spanclass="pre">__half_raw</span></code>, and defines a new <codeclass="docutils literal"><spanclass="pre">__half</span></code> class type that has constructors, conversion operators, and also provides overloaded arithmetic operators such as follows:</p>
<p>This new design makes <codeclass="docutils literal"><spanclass="pre">c</span><spanclass="pre">=</span><spanclass="pre">a</span><spanclass="pre">+</span><spanclass="pre">b</span></code> work correctly for CUDA half data type.</p>