VK_QCOM_image_processing3.proposal
This document proposes a new extension that adds shader built-in functions for additional image processing operations.
Problem Statement
Popular image processing applications such as super resolution upscaling and contrast-adaptive sharpening often rely on texture gather operations, typically involving 12-tap and 5-tap gather operations respectively.
There are also usages for a linear 4-tap gather operation, such as kernels that require vectorized loads, and improving cache locality for linear access.
The latest Adreno™ GPUs feature dedicated HW shader instructions optimized for these specific gather patterns, enabling significant performance and power efficiency improvements compared to existing implementations.
Additionally, the VK_QCOM_image_processing2
extension added extended block matching operations
OpImageBlockMatchWindow*QCOM and OpImageBlockMatchGather*QCOM
without adding a new format feature flag. This limits exposing formats for block matching
operations to the common set with those provided by
VK_QCOM_image_processing.
Solution Space
Existing texture gather operations such as:
- SPIR-V:
OpImageGatherwithConstOffsetsimage operand - GLSL:
textureGatherOffsets
already support arbitrary offset patterns via constant arrays. However, these mechanisms do not guarantee alignment with hardware-accelerated gather patterns, which can lead to suboptimal performance. While implementations could enumerate supported offset patterns and allow applications to match them dynamically, this approach introduces complexity:
- Applications must restructure shaders to match enumerated patterns.
- Algorithms tightly coupled to specific gather patterns may require multiple shader variants.
- Runtime pattern matching adds overhead and reduces clarity.
Further, we have restrictions specific to these patterns, such as no support for cube map sampling, which applications may inadvertently trigger emulated paths and degrade performance. Ideally, any such restrictions would be made invalid usage to achieve more consistent performance.
Instead, we propose introducing new SPIR-V instructions that directly encode specific, hardware-accelerated gather patterns.
With regards to format feature flags, a new format feature flag for OpImageBlockMatchWindow*QCOM
and OpImageBlockMatchGather*QCOM would break backwards compatibility for apps that implement
VK_QCOM_image_processing2 when new formats are exposed with
VK_FORMAT_FEATURE_2_BLOCK_MATCHING_BIT_QCOM. Therefore, a new super-set format feature flag
will need to be introduced for OpImageBlockMatchSSDQCOM and OpImageBlockMatchSADQCOM instead,
despite the fact that VK_FORMAT_FEATURE_2_BLOCK_MATCHING_BIT_QCOM was originally introduced
specifically for them.
Proposal
The extension exposes support for a new SPIR-V instruction.
SPIR-V Built-in Functions
OpImageGatherQCOM Gathers the requested component values from the four texels which are specified with the Mode operand. Result Type must be a vector of four components of floating-point type or integer type. Its components must be the same as Sampled Type of the underlying OpTypeImage (unless that underlying Sampled Type is OpTypeVoid). It has one component per gathered texel. Sampled Image must be an object whose type is OpTypeSampledImage. Its OpTypeImage must have a Dim of 2D or Rect. The MS operand of the underlying OpTypeImage must be 0. Coordinate must be a vector of floating-point type. It contains (u, v [, array layer]) as needed by the definition of Sampled Image. Component is the component number gathered from all four texels. It must be a 32-bit integer type scalar. Behavior is undefined if its value is not 0, 1, 2 or 3. Mode specifies four texels from which the component values are gathered. It must be a 32-bit integer type scalar constant. Behavior is undefined if its value is not 0, 1, 2 or 3. Image Operands encodes what operands follow, as per Image Operands. | Capability: ImageGatherLinearQCOM, ImageGatherExtendedModesQCOM | ||||||||
7 + variable | 4545 | <id> Result Type | <id> Sampled Image | <id> Coordinate | <id> Component | <id> Mode | Optional Image Operands | Optional <id>, <id>, … | |
This functionality includes the following SPIR-V capability:
| Capability | Implicitly declares | |
|---|---|---|
4543 | ImageGatherLinearQCOM Add the extended Image Gather instruction along with the Gather4x1QCOM mode | |
4544 | ImageGatherExtendedModesQCOM Add the extended Image Gather instruction along with the GatherDQCOM, GatherH2QCOM, and GatherV2QCOM modes | |
High Level Language Exposure
The following summarizes how the built-ins are exposed in GLSL:
-------------------------------------------------------------------------------------------------
| gvec4 textureGather4x1QCOM(gsampler2D sampler, | |
| vec2 P [, int comp]) | If specified, the value of comp must |
| gvec4 textureGather4x1QCOM(gsampler2DArray | be a constant integer expression with |
| sampler, vec3 P [, int comp]) | a value of 0, 1, 2, or 3, identifying |
----------------------------------------------------| the x, y, z, or w post-swizzled |
| gvec4 textureGatherV2QCOM(gsampler2D sampler, | component of the four-component vector |
| vec2 P [, int comp]) | lookup result for each texel, |
| gvec4 textureGatherV2QCOM(gsampler2DArray | respectively. If comp is not specified, |
| sampler, vec3 P [, int comp]) | it is treated as 0, selecting the x |
----------------------------------------------------| component of each texel to generate the |
| gvec4 textureGatherH2QCOM(gsampler2D sampler, | result. |
| vec2 P [, int comp]) | |
| gvec4 textureGatherH2QCOM(gsampler2DArray | |
| sampler, vec3 P [, int comp]) | |
----------------------------------------------------| |
| gvec4 textureGatherDQCOM(gsampler2D sampler, | |
| vec2 P [, int comp]) | |
| gvec4 textureGatherDQCOM(gsampler2DArray sampler, | |
| vec3 P [, int comp]) | |
-------------------------------------------------------------------------------------------------
| gvec4 textureGather4x1OffsetQCOM(gsampler2D | |
| sampler, vec2 P, ivec2 offset [, int comp]) | Perform a texture gather operation as in |
| gvec4 textureGather4x1OffsetQCOM(gsampler2DArray | textureGather*QCOM by offset as described |
| sampler, vec3 P, ivec2 offset [, int comp]) | in textureOffset except that the offset |
----------------------------------------------------| can be variable (non constant) and the |
| gvec4 textureGatherV2OffsetQCOM(gsampler2D | implementation-dependent minimum and |
| sampler, vec2 P, ivec2 offset [, int comp]) | maximum offset values are given by |
| gvec4 textureGatherV2OffsetQCOM(gsampler2DArray | MIN_PROGRAM_TEXTURE_GATHER_OFFSET and |
| sampler, vec3 P, ivec2 offset [, int comp]) | MAX_PROGRAM_TEXTURE_GATHER_OFFSET, |
----------------------------------------------------| respectively. |
| gvec4 textureGatherH2OffsetQCOM(gsampler2D | |
| sampler, vec2 P, ivec2 offset [, int comp]) | |
| gvec4 textureGatherH2OffsetQCOM(gsampler2DArray | |
| sampler, vec3 P, ivec2 offset [, int comp]) | |
----------------------------------------------------| |
| gvec4 textureGatherDOffsetQCOM(gsampler2D | |
| sampler, vec2 P, ivec2 offset [, int comp]) | |
| gvec4 textureGatherDOffsetQCOM(gsampler2DArray | |
| sampler, vec3 P, ivec2 offset [, int comp]) | |
-------------------------------------------------------------------------------------------------
VkImage compatibility
In general, the compatibility rules for images supporting the SPIR-V OpImageGatherQCOM
are the same as the rules for OpImageGather, with the exception that the following
are not supported:
- Cube Map views
ConstOffsetsoperand- Shadow depth comparison, eg.
OpImageDrefGather - Sparse residency check, eg.
OpImageSparseGather
VkFormat support
This extension adds the following feature flag:
static const VkFormatFeatureFlagBits2 VK_FORMAT_FEATURE_2_BLOCK_MATCHING_SXD_BIT_QCOM = 0x100000000000ULL;
- VK_FORMAT_FEATURE_2_BLOCK_MATCHING_SXD_BIT_QCOM - Formats with this feature are supported
with
OpImageBlockMatchSSDQCOMandOpImageBlockMatchSADQCOM. Every format that has theVK_FORMAT_FEATURE_2_BLOCK_MATCHING_BIT_QCOMfeature flag must have this feature flag set by the implementation.
Applications should use VK_FORMAT_FEATURE_2_BLOCK_MATCHING_SXD_BIT_QCOM where
available when they wish to query the full format support for
OpImageBlockMatchSSDQCOM and OpImageBlockMatchSADQCOM operations.
The formats exposed with VK_FORMAT_FEATURE_2_BLOCK_MATCHING_BIT_QCOM are
still compatible with OpImageBlockMatchSSDQCOM and
OpImageBlockMatchSADQCOM operations, but the set is limited to the
formats also compatible with OpImageBlockMatchWindow*QCOM and
OpImageBlockMatchGather*QCOM.
Features
Support is indicated by the feature in a structure that extends VkPhysicalDeviceFeatures2..
typedef struct VkPhysicalDeviceImageProcessing3FeaturesQCOM {
VkStructureType sType;
void* pNext;
VkBool32 imageGatherLinear;
VkBool32 imageGatherExtendedModes;
VkBool32 blockMatchExtendedClampToEdge;
} VkPhysicalDeviceImageProcessing3FeaturesQCOM;
imageGatherLinearindicates that the implementation supports SPIR-V modules declaring theImageGatherLinearQCOMcapability.imageGatherExtendedModesindicates that the implementation supports SPIR-V modules declaring theImageGatherExtendedModesQCOMcapability.blockMatchExtendedClampToEdgeindicates that the implementation supportsVK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGEfor both input images of all block match operations