提交 e980ca7a 编写于 作者: S Szabolcs Nagy 提交者: Rich Felker

define FP_FAST_FMA* when fma* can be inlined

FP_FAST_FMA can be defined if "the fma function generally executes about
as fast as, or faster than, a multiply and an add of double operands",
which can only be true if the fma call is inlined as an instruction.

gcc sets __FP_FAST_FMA if __builtin_fma is inlined as an instruction,
but that does not mean an fma call will be inlined (e.g. it is defined
with -fno-builtin-fma), other compilers (clang) don't even have such
macro, but this is the closest we can get.

(even if the libc fma implementation is a single instruction, the extern
call overhead is already too big when the macro is used to decide between
x*y+z and fma(x,y,z) so it cannot be based on libc only, defining the
macro unconditionally on targets which have fma in the base isa is also
incorrect: the compiler might not inline fma anyway.)

this solution works with gcc unless fma inlining is explicitly turned off.
上级 65c8be38
......@@ -36,6 +36,18 @@ extern "C" {
#define FP_SUBNORMAL 3
#define FP_NORMAL 4
#ifdef __FP_FAST_FMA
#define FP_FAST_FMA 1
#endif
#ifdef __FP_FAST_FMAF
#define FP_FAST_FMAF 1
#endif
#ifdef __FP_FAST_FMAL
#define FP_FAST_FMAL 1
#endif
int __fpclassify(double);
int __fpclassifyf(float);
int __fpclassifyl(long double);
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册