VK_KHR_shader_fma.proposal
Problem Statement
Fused-multiply-add (FMA) operations form the basis of many high-accuracy numerical computations. The fused operation gives higher accuracy than a pair of a multiply and an add operation because there is no intermediate rounding step, and is commonly cheaper than the pair of operations. This means that it has become a widely used and well understood numerical building block.
The Vulkan extended instruction set for SPIR-V (GLSL.Std.450) contains an fma primitive, but it is not required to be implemented in hardware as a fused operation. This allows applications to take any performance benefits that are available from using fma but it makes no guarantee about the accuracy. If high accuracy is required, applications must therefore use other algorithms that do not rely on fused-multiply-add or emulate the operation themselves. Either can carry significant cost, sometimes up to 100x.
This proposal aims to allow applications to rely on the accuracy of fused- multiply-add operations.
Solution Space
The existing fma SPIR-V extended instruction can be enhanced to provide the guaranteed accuracy or a separate operation can be added.
Some implementations can gain significant performance from using the unfused variant of multiply-add and will want to retain this option for apps that use fma primitives. This means that a global enable would affect pipeline compilation in an undesirable way. A pipeline creation flag could be used, but this pulls some of the semantics of the shader out of the shader code.
Adding a separate SPIR-V instruction requires more plumbing and leaves multiple confusingly similar ways to implement multiply add operations.
Proposal
Create a SPIR-V extension adding OpFmaKHR to the core instruction set. Apps can continue using GLSL.std.450.Fma for cases where full accuracy is not required and can now use OpFmaKHR where it is.