
This document proposes a new extension that adds shader built-in functions and descriptor types for image processing.

Problem Statement

GPUs commonly process images for a wide range of use-cases. These include enhancement of externally sourced images (i.e., camera image enhancement), post processing of GPU-rendered game content, image scaling, and image analysis (i.e., motion vector generation). For common use-cases, the existing texture built-ins combined with bilinear/bicubic filtering work well. In other cases, higher-order filtering kernels or advanced image algorithms are required.

While such algorithms could be implemented in shader code generically using existing texture built-in functions, it requires many round-trips between the texture unit and shader unit. The latest Adreno GPUs have dedicated HW shader instructions for such image processing tasks, enabling advanced functionality with simplified shader code. For some use-cases, significant performance and power savings are possible using dedicated texture sampling instructions.

Solution Space

Adreno GPUs have native support for multiple image processing instructions:

  • High-order (up to 64x64 kernel) filters with application-supplied weights, and sub-texel phasing support
  • High-order (up to 64x64) box filtering with HW-computed weights, and fractional box sizes
  • Block Matching (up to 64x64) pixel regions across images

These capabilities are currently not exposed in Vulkan. Exposing these instructions would provide a significant increase in functionality beyond current SPIR-V texture built-ins. Adreno GPUs exposing this extension perform the above algorithms fully inside the texture unit, saving shader instructions cycles, memory bandwidth, and shader register space.


The extension exposes support for 3 new SPIR-V instructions:

  • OpImageWeightedSampleQCOM: This instruction performs a weighted texture sampling operation involving two images: the sampled image and the weight image. An MxN region of texels in the sampled image are convolved with an MxN set of scalar weights provided in the weight image. Large filter sizes up to 64x64 taps enable important use-cases like edge-detection, feature extraction, and anti-aliasing.
    • Sub-pixel Weighting: Frequently the texture coordinates will not align with a texel center in the sampled image, and in such cases the kernel weights can be adjusted to reflect the sub-texel sample location. Sub-texel weighting is supported, where the texel is subdivided into PxP sub-texels, called "phases", with unique weights per-phase. Adreno GPUs support up to 32x32 phases.
    • Separable-filters: Many common 2D image filtering kernels can be expressed as a mathematically equivalent 1D separable kernel. Separable filters offer significant performance/power savings over their non-separable equivalent. This instruction supports both separable and non-separable filtering kernels.
  • OpImageBoxFilterQCOM: This instruction performs weighted average of the texels within a screen-aligned box. The operation is similar to bi-linear filtering, except the region of texels is not limited to 2x2. The instruction includes a BoxSize parameter, with fractional box sizes up to [64.0, 64.0]. Similar to bi-linear filtering, the implementation computes a weighted average for all texels covered by the box, with the weight for each texel proportional covered area. Large box sizes up to 64x64 enable important use-cases like bulk mipmap generation and high quality single-pass image down-scaling with arbitrary scaling ratios (e.g. thumbnail generation).
  • opImageBlockMatchSAD and opImageBlockMatchSSD: These instructions perform a block matching operation involving two images: the target image and reference image. The instruction takes two sets of integer texture coordinates, and an integer BlockSize parameter. An MxN region of texels in the target image is compared with an MxN region in the reference image. The instruction returns a per-component error metric describing the difference between the two regions. The SAD returns the sum of the absolute errors and SSD returns the sum of the squared differences.

Each of the image processing instructions operate only on 2D images. The instructions do not-support sampling of mipmap, multi-plane, multi-layer, multi-sampled, or depth/stencil images. The new instructions can be used in any shader stage.

Exposing this functionality in Vulkan makes use of a corresponding SPIR-V extension, and the built-ins will be exposed in high-level languages (e.g., GLSL) via related extensions.

SPIR-V Built-in Functions

OpImageSampleWeightedQCOM Weighted sample operationResult Type is the type of the result of weighted sample operation Texture Sampled Image must be an object whose type is OpTypeSampledImage. The MS operand of the underlying OpTypeImage must be 0. Coordinate must be a vector of floating-point type, whose vector size is 2. Weight Image must be an object whose type is OpTypeSampledImage decorated with WeightTextureQCOM. The MS operand of the underlying OpTypeImage must be 0.




<id> Result Type

'Result <id>'

<id> Texture Sampled Image

<id> Coordinate

OpImageBoxFilterQCOM Image box filter operation.Result Type is the type of the result of image box filter operation Texture Sampled Image must be an object whose type is OpTypeSampledImage. The MS operand of the underlying OpTypeImage must be 0. Coordinate must be a vector of floating-point type, whose vector size is 2. Box Size must be a vector of floating-point type, whose vector size is 2 and signedness is 0.




<id> Result Type

'Result <id>'

<id> Texture Sampled Image

<id> Coordinate

OpImageBlockMatchSADQCOM Image block match sum of absolute differences.Result Type is the type of the result of image block match sum of absolute differences Target Sampled Image must be an object whose type is OpTypeSampledImage decorated with BlockMatchTextureQCOM. The MS operand of the underlying OpTypeImage must be 0. Target Coordinate must be a vector of integer type, whose vector size is 2 and signedness is 0. Reference Sampled Image must be an object whose type is OpTypeSampledImage decorated with BlockMatchTextureQCOM. The MS operand of the underlying OpTypeImage must be 0. Reference Coordinate must be a vector of integer type, whose vector size is 2 and signedness is 0. Block Size must be a vector of integer type, whose vector size is 2 and signedness is 0.




<id> Result Type

'Result <id>'

<id> Target Sampled Image

<id> Target Coordinate

<id> Reference Sampled Image

<id> Reference Coordinate

OpImageBlockMatchSSDQCOM Image block match sum of square differences.Result Type is the type of the result of image block match sum of square differences Target Sampled Image must be an object whose type is OpTypeSampledImage decorated with BlockMatchTextureQCOM. The MS operand of the underlying OpTypeImage must be 0. Target Coordinate must be a vector of integer type, whose vector size is 2 and signedness is 0. Reference Sampled Image must be an object whose type is OpTypeSampledImage decorated with BlockMatchTextureQCOM. The MS operand of the underlying OpTypeImage must be 0. Reference Coordinate must be a vector of integer type, whose vector size is 2 and signedness is 0. Block Size must be a vector of integer type, whose vector size is 2 and signedness is 0.




<id> Result Type

'Result <id>'

<id> Target Sampled Image

<id> Target Coordinate

<id> Reference Sampled Image

<id> Reference Coordinate

The extension adds two new SPIR-V decorations

DecorationExtra OperandsEnabling Capabilities


WeightTextureQCOM Apply to a texture used as 'Weight Image' in OpImageSampleWeightedQCOM. Behavior is defined by the runtime environment.



BlockMatchTextureQCOM Apply to textures used as 'Target Sampled Image' and 'Reference Sampled Image' in OpImageBlockMatchSSDQCOM/OpImageBlockMatchSADQCOM. Behavior is defined by the runtime environment.


This functionality is gated behind 3 SPIR-V capabilities:

CapabilityImplicitly declares


TextureSampleWeightedQCOM Add weighted sample operation.


Implicitly declares


TextureBoxFilterQCOM Add box filter operation.


Implicitly declares


TextureBlockMatchQCOM Add block matching operation (sum of absolute/square differences).

High Level Language Exposure

The following summarizes how the built-ins are exposed in GLSL:

    | Syntax                             | Description                                |
    |   vec4 textureWeightedQCOM(        | weighted sample operation multiplies       |
    |       sampler2D tex,               | a 2D kernel of filter weights with a       |
    |       vec2      P,                 | corresponding region of sampled texels and |
    |       sampler2DArray weight)       | sums the results to produce the output     |
    |                                    | value.                                     |
    |   vec4 textureBoxFilterQCOM(       | Linear operation taking average of pixels  |
    |       sampler2D tex,               | within the spatial region described by     |
    |       vec2      P,                 | boxSize.  The box is centered at coordinate|
    |       vec2      boxSize)           | P and has width and height of boxSize.x    |
    |                                    | and boxSize.y.                             |
    |   vec4 textureBlockMatchSADQCOM(   | Block matching operation measures the      |
    |       sampler2D target             | correlation (or similarity) of the target  |
    |       uvec2     targetCoord,       | block and reference block.  TargetCoord    |
    |       sampler2D reference,         | and refCoord specify the bottom-left corner|
    |       uvec2     refCoord,          | of the block in target and reference       |
    |       uvec2     blockSize)         | images. The error metric is the Sum of     |
    |                                    | Absolute Differences(SAD).                 |
    |   vec4 textureBlockMatchSSDQCOM(   | Block matching operation measures the      |
    |       sampler2D target             | correlation (or similarity) of the target  |
    |       uvec2     targetCoord,       | block and reference block.  TargetCoord    |
    |       sampler2D reference,         | and refCoord specify the bottom-left corner|
    |       uvec2     refCoord,          | of the block in target and reference       |
    |       uvec2     blockSize)         | images. The error metric is the Sum of     |
    |                                    | Square Differences(SSD).                   |

Features and Properties

Support for weighted sampling, box filtering, and block matching operations are indicated by feature bits in a structure that extends VkPhysicalDeviceFeatures2.

typedef struct VkPhysicalDeviceImageProcessingFeaturesQCOM {
    VkStructureType    sType;
    void*              pNext;
    VkBool32           textureSampleWeighted;
    VkBool32           textureBoxFilter;
    VkBool32           textureBlockMatch;
} VkPhysicalDeviceImageProcessingFeaturesQCOM;

textureSampleWeighted indicates that the implementation supports SPIR-V modules declaring the TextureSampleWeightedQCOM capability. textureBoxFilter indicates that the implementation supports SPIR-V modules declaring the TextureBoxFilterQCOM capability. textureBlockMatch indicates that the implementation supports SPIR-V modules declaring the TextureBlockMatchQCOM capability.

Implementation-specific properties are exposed in a structure that extends VkPhysicalDeviceProperties2.

typedef struct VkPhysicalDeviceImageProcessingPropertiesQCOM {
    VkStructureType    sType;
    void*              pNext;
    uint32_t           maxWeightFilterPhases;
    VkExtent2D         maxWeightFilterDimension;
    VkExtent2D         maxBlockMatchRegion;
    VkExtent2D         maxBoxFilterBlockSize;
} VkPhysicalDeviceImageProcessingPropertiesQCOM;

maxWeightFilterPhases is the maximum number of sub-pixel phases supported for OpImageSampleWeightedQCOM. maxWeightFilterDimension is the largest supported filter size (width and height) for OpImageSampleWeightedQCOM. maxBlockMatchRegion is the largest supported region size (width and height) for OpImageBlockMatchSSDQCOM and OpImageBlockMatchSADQCOM. maxBoxFilterBlockSize is the largest supported BoxSize (width and height) for OpImageBoxFilterQCOM.

VkSampler compatibility

VkSampler objects created for use with the built-ins added with this extension must be created with VK_SAMPLER_CREATE_IMAGE_PROCESSING_BIT_QCOM. Such samplers must not be used with the other existing OpImage* built-ins unrelated to this extension. In practice, this means an application must create dedicated VkSamplers for use with this extension.

The OpImageSampleWeightedQCOM and OpImageSampleBoxFilterQCOM built-ins support samplers with unnormalizedCoordinates equal to VK_TRUE or VK_FALSE. The OpImageBlockMatchSADQCOM and OpImageBlockMatchSSDQCOM require a sampler with unnormalizedCoordinates equal to VK_TRUE.

All built-ins added with this extension support samplers with addressModeU and addressModeV equal to VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE or VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER. If VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER is used, the borderColor must be opaque black.

All built-ins added with this extension support samplers with all VkSamplerReductionModes.

The other VkSamplerCreateInfo parameters must be set to a default values but generally have no effect on the built-ins.

VkImage compatibility

When creating a VkImage for compatibility with the new built-ins, the driver needs additional usage flags. VkImages must be created with VK_IMAGE_USAGE_SAMPLE_WEIGHT_BIT_QCOM when used as a weight image with OpImageSampleWeightedQCOM. VkImages must be created with VK_IMAGE_USAGE_SAMPLE_BLOCK_MATCH_BIT_QCOM when used as a reference image or target image with OpImageBlockMatchSADQCOM or OpImageBlockMatchSSDQCOM.

Descriptor Types

This extension adds two new descriptor Types:


VK_DESCRIPTOR_TYPE_SAMPLE_WEIGHT_IMAGE_QCOM specifies a 2D image array descriptor for a weight image can be used with OpImageSampleWeightedQCOM. The corresponding VkImageView must have been created with VkImageViewSampleWeightCreateInfoQCOM in the pNext chain.

VK_DESCRIPTOR_TYPE_BLOCK_MATCH_IMAGE_QCOM specifies a 2D image descriptor for the reference image or target image that can be used with OpImageBlockMatchSADQCOM or OpImageBlockMatchSSDQCOM.

VkFormat Support

Implementations will advertise format support for this extension through the linearTilingFeatures or optimalTilingFeatures of VkFormatProperties3


The SPIR-V OpImageSampleWeightedQCOM instruction takes two image parameters: the weight image which holds weight values, and the sampled image which holds the texels being sampled.

  • VK_FORMAT_FEATURE_2_WEIGHT_IMAGE_BIT_QCOM specifies that the format is supported as a weight image with OpImageSampleWeightedQCOM.
  • VK_FORMAT_FEATURE_2_WEIGHT_SAMPLED_IMAGE_BIT_QCOM specifies that the format is supported as a sampled image with OpImageSampleWeightedQCOM.

The SPIR-V OpImageBlockMatchSADQCOM and OpImageBlockMatchSADQCOM instructions take two image parameters: the target image and the reference image.

  • VK_FORMAT_FEATURE_2_BLOCK_MATCHING_BIT_QCOM specifies that the format is supported as a target image or reference image with both OpImageBlockMatchSADQCOM and OpImageBlockMatchSADQCOM.

The SPIR-V OpImageBoxFilterQCOM instruction takes one image parameter, the sampled image.

  • VK_FORMAT_FEATURE_2_BOX_FILTER_SAMPLED_BIT_QCOM specifies that the format is supported as sampled image with OpImageBoxFilterQCOM.

Weight Image Sampling

The SPIR-V OpImageSampleWeightedQCOM instruction takes 3 operands: sampled image, weight image, and texture coordinates. The instruction computes a weighted average of an MxN region of texels in the sampled image, using a set of MxN weights in the weight image.

To create a VkImageView for the weight image, the VkImageViewCreateInfo structure is extended to provide weight filter parameters.

typedef struct VkImageViewSampleWeightCreateInfoQCOM {
    VkStructureType    sType;
    const void*        pNext;
    VkOffset2D         filterCenter;
    VkExtent2D         filterSize;
    uint32_t           numPhases;
} VkImageViewSampleWeightCreateInfoQCOM;

The texture coordinates provided to OpImageSampleWeightedQCOM, combined with the filterCenter and filterSize selects a region of texels in the sampled texture:

// let (u,v) be 2D unnormalized coordinates passed to `OpImageSampleWeightedQCOM`.
// The lower-left-texel of the region has integer texel coordinates (i0,j0):
i0 =  floor(u) - filterCenter.x
j0 =  floor(v) - filterCenter.y

// the upper-right texel of the region has integer coordinates (imax,jmax)
imax = i0 + filterSize.width - 1
jmax = j0 + filterSize.height - 1

If the sampler reductionMode is VK_SAMPLER_REDUCTION_MODE_WEIGHTED_AVERAGE then the value of each texel in the region is multiplied by the associated value from the weight texure, and the resulting weighted average is summed for each component across all texels in the region. Note that since the weight values are application-defined, their sum may be greater than 1.0 or less than 0.0, therefore the filter output for UNORM format may be greater than 1.0 or less than 0.0.

If the sampler reductionMode is VK_SAMPLER_REDUCTION_MODE_MIN or VK_SAMPLER_REDUCTION_MODE_MAX, a component-wise minimum or maximum is computed, for all texels in the region with non-zero weights.

Sub-texel weighting

The weight image can optionally provide sub-texel weights. This feature is enabled by setting numPhases to a value greater than 1. In this case, weight image specifies numPhases unique sets of filterSize.width x filterSize.height weights for each phase.

The texels in the sampled image are is subdivided both horizontally and vertically in to an NxN grid of sub-texel regions, or "phases". The number of horizontal and vertical subdivisions must be equal, must be a power-of-two. numPhases is the product of the horizontal and vertical phase counts.

For example, numPhases equal to 4 means that texel is divided into two vertical phases and two horizontal phases, and that the weight texture defines 4 sets of weights, each with a width and height as specified by filterSize. The texture coordinate sub-texel location will determine which set of weights is used. The maximum supported values for numPhases and filterSize is specified by VkPhysicalDeviceImageProcessingPropertiesQCOM maxWeightFilterPhases and maxWeightFilterDimension respectively.

Weight Image View Type

The OpImageSampleWeightedQCOM weight image created with VkImageViewSampleWeightCreateInfoQCOM must have a viewType of either VK_IMAGE_VIEW_TYPE_1D_ARRAY which indicates separable weight encoding, or VK_IMAGE_VIEW_TYPE_2D_ARRAY which indicates non-separable weight encoding as described below.

The view type (1D array or 2D array) is the sole indication whether the weights are separable or non-separable — there is no other API state nor any shader change to designate separable versus non-separable weight image.

Non-Separable Weight Encoding

For a non-separable weight filtering, the view will be type VK_IMAGE_VIEW_TYPE_2D_ARRAY. Each layer of the 2D array corresponds to one phase of the filter. The view’s VkImageSubresourceRange::layerCount must be equal to VkImageViewSampleWeightCreateInfoQCOM::numPhases. The phases are stored as layers in the 2D array, in horizontal phase major order, left-to-right and top-to-bottom. Expressed as a formula, the layer index for each filter phase is computed as:

layerIndex(horizPhase,vertPhase,horizPhaseCount) = (vertPhase * horizPhaseCount) + horizPhase

For each layer, the weights are specified by the value in texels [0, 0] to [filterSize.width-1, filterSize.height-1]. While is valid for the view’s VkImage to have width/height larger than filterSize, image texels with integer coordinates greater than or equal to filterSize are ignored by weight sampling. Image property query instructions OpImageQuerySize, OpImageQuerySizeLod, OpImageQueryLevels, and OpImageQuerySamples return undefined values for a weight image descriptor.

Separable Weight Encoding

For a separable weight filtering, the view will be type VK_IMAGE_VIEW_TYPE_1D_ARRAY. Horizontal weights for all phases are packed in layer '0' and the vertical weights for all phases are packed in layer '1'. Within each layer, the weights are arranged into groups of 4. For each group, the weights are ordered by phase. Expressed as a formula, the 1D texel offset for all weights and phases within each layer is computed as:

// Let horizontal weights have a weightIndex of [0, filterSize.width - 1]
// Let vertical weights have a weightIndex of [0, filterSize.height - 1]
// Let phaseCount be the number of phases in either the vertical or horizontal direction.

texelOffset(phaseIndex,weightIndex,phaseCount) = (phaseCount * 4 * (weightIndex / 4)) + (phaseIndex * 4) + (weightIndex % 4)

Box Filter Sampling

The SPIR-V OpImageBoxFilterQCOM instruction takes 3 operands: sampled image, box size, and texture coordinates. Note that box size specifies a floating-point width and height in texels. The instruction computes a weighted average of all texels in the sampled image that are covered (either partially or fully) by a box with the specified size and centered at the specified texture coordinates.

For each texel covered by the box, a weight value is computed by the implementation. The weight is proportional to the area of the texel covered. Those texels that are fully covered by the box receive a weight of 1.0. Those texels that are partially covered by the box receive a weight proportional to the covered area. For example, a texel that has one quarter of its area covered by the box will receive a weight of 0.25.

If the sampler reductionMode is VK_SAMPLER_REDUCTION_MODE_WEIGHTED_AVERAGE then the value of each covered texel is multiplied by the weight, and the resulting weighted average is summed for each component across all covered texels. The resulting sum is then divided by the box size area.

If the sampler reductionMode is VK_SAMPLER_REDUCTION_MODE_MIN or VK_SAMPLER_REDUCTION_MODE_MAX, a component-wise minimum or maximum is computed, for all texels covered by the box, including texels that are partially covered.

Block Matching Sampling

The SPIR-V OpImageBlockMatchSADQCOM and OpImageBlockMatchSSDQCOM instructions each takes 5 operands: target image, target coordinates, reference image, reference coordinates, and block size. Each instruction computes an error metric, that describes whether a block of texels in the target image matches a corresponding block of texels in the reference image. The error metric is computed per-component. OpImageBlockMatchSADQCOM computes "Sum Of Absolute Difference" and OpImageBlockMatchSSDQCOM computes "Sum of Squared Difference", but otherwise both instructions are similar.

Both target coordinates and reference coordinates are integer texel coordinates of the lower-left texel of the block to be matched in the target image and reference image respectively. The block size provides the height and width in integer texels of the regions to be matched.

Note that the coordinates and block size may result in a region that extends beyond the bounds of target image or reference image. For target image, this is valid and the sampler addressModeU and addressModeV will determine the value of such texels. For reference image case this will result in undefined values returned. The application must guarantee that the reference region does not extend beyond the bounds of _reference image.

For each texel in the regions, a difference value is computed by subtracting the target value from the reference value. OpImageBlockMatchSADQCOM computes the absolute value of the difference; this is the texel error. OpImageBlockMatchSSDQCOM computes the square of the difference; this is the texel error squared.

If the sampler reductionMode is VK_SAMPLER_REDUCTION_MODE_WEIGHTED_AVERAGE then the texel error or texel_error_squared for each texel in the region is summed for each component across all texels.

If the sampler reductionMode is VK_SAMPLER_REDUCTION_MODE_MIN or VK_SAMPLER_REDUCTION_MODE_MAX, a component-wise minimum or maximum is computed, for all texels in the region. OpImageBlockMatchSADQCOM returns the minimum or maximum texel error across all texels. OpImageBlockMatchSSDQCOM returns the minimum or maximum texel error squared. Note that OpImageBlockMatchSSDQCOM does not return the minimum or maximum of texel error squared.

Expected Features and limits

Below are the properties, features, and formats that are expected to be advertised by a Adreno drivers supporting this extension:

Features supported in VkPhysicalDeviceImageProcessingFeaturesQCOM:

    textureSampleWeighted   = TRUE
    textureBoxFilter        = TRUE
    textureBlockMatch       = TRUE

Properties reported in VkPhysicalDeviceImageProcessingPropertiesQCOM

    maxWeightFilterPhases       = 1024
    maxWeightFilterDimension    = 64
    maxBlockMatchRegion         = 64
    maxBoxFilterBlockSize       = 64

Formats supported by sampled image parameter to OpImageSampleWeightedQCOM and OpImageBoxFilterQCOM


Formats supported by weight image parameter to OpImageSampleWeightedQCOM


Formats supported by target image or reference image parameter to OpImageBlockMatchSADQCOM and OpImageBlockMatchSSDQCOM



RESOLVED: Should this be one extension or 3 extensions?

For simplicity, and since we expect this extension supported only for Adreno GPUs, we propose one extension with 3 feature bits. The associated SPIR-V extension will have 3 capabilities. The associated GLSL extension will have 3 extension strings.

RESOLVED: How does this interact with descriptor indexing ?

The new built-ins added by this extension support descriptor arrays and dynamic indexing, but only if the index is dynamically uniform. The "update-after-bind" functionality is fully supported. Non-uniform dynamic indexing is not supported. There are no feature bits for an implementation to advertise support for dynamic indexing with the shader built-ins added in this extension.

The new descriptor types for sample weight image and block match image count against the maxPerStageDescriptor[UpdateAfterBind]SampledImages and maxDescriptorSetUpdate[AfterBind]SampledImages limits. bind"

RESOLVED: How does this extension interact with EXT_robustness2 ?

These instructions do not support nullDescriptor feature of robustness2. If any descriptor accessed by these instructions is not bound, undefined results will occur.

RESOLVED: How does this interact with push descriptors ?

The descriptors added by this extension can be updated using vkCmdPushDescriptors