Add half precision / float16 support in Paddle
Created by: kexinzhao
Currently, half precision floating point (float16) data type is not supported in Paddle.
Adding the float16 data type could potentially:
- reduce storage space
- save memory bandwidth usage
- arithmetic speed up if supported by hardware
A brief survey of float16 support on different hardwares:
ARM processor:
float16 storage and conversion to/from float32 is generally supported in armv7 and armv8.
However float16 arithmetic is only supported since armv8.2A (Quote: "IEEE754-2008 formatted half-precision floating point data processing is added to Armv8.2-A").
There are currently very limited device using CPU of armv8.2A architecture (the only one I found is newly launched cortex-A75 which will be used in Qualcomm Snapdragon 845).
x86/x64 CPU:
float16 is only supported as a storage type and intrinsics are available for conversion between float16 and float32.
Nvidia GPU:
fp16 storage and arithmetic available since cuda 7.5 on supported GPUs (e.g. PASCAL GPUs).