Video Coding

Vulkan implementations may expose one or more queue families supporting video coding operations. These operations are performed by recording them into a command buffer within a video coding scope, and submitting them to queues with compatible video coding capabilities.

The Vulkan video functionalities are designed to be made available through a set of APIs built on top of each other, consisting of:

A core API providing common video coding functionalities,
APIs providing codec-independent video decode and video encode related functionalities, respectively,
Additional codec-specific APIs built on top of those.

This chapter details the fundamental components and operations of these.

Video Picture Resources

In the context of video coding, multidimensional arrays of image data that can be used as the source or target of video coding operations are referred to as video picture resources. They may store additional metadata that includes implementation-private information used during the execution of video coding operations, as discussed later.

Video picture resources are backed by VkImage objects. Individual subregions of VkImageView objects created from such resources can be used as decode output pictures, encode input pictures, reconstructed pictures, and/or reference pictures.

The parameters of a video picture resource are specified using a VkVideoPictureResourceInfoKHR structure.

Decoded Picture Buffer

An integral part of video coding pipelines is the reconstruction of pictures from a compressed video bitstream. A reconstructed picture is a video picture resource resulting from this process.

Such reconstructed pictures can be used as reference pictures in subsequent video coding operations to provide predictions of the values of samples of subsequently decoded or encoded pictures. The correct use of such reconstructed pictures as reference pictures is driven by the video compression standard, the implementation, and the application-specific use cases.

The list of reference pictures used to provide such predictions within a single video coding operation is referred to as the list of active reference pictures.

The decoded picture buffer (DPB) is an indexed data structure that maintains the set of reference pictures available to be used in video coding operations.

Individual indexed entries of the DPB are referred to as the decoded picture buffer (DPB) slots.

The range of valid DPB slot indices is between zero and N-1, where N is the capacity of the DPB. Each DPB slot can refer to a reference picture containing a video frame or can refer to up to two reference pictures containing the top and/or bottom fields that, when both present, together represent a full video frame .

In Vulkan, the state and the backing store of the DPB is separated as follows:

The state of individual DPB slots is maintained by video session objects.
The backing store of DPB slots is provided by subregions of VkImage objects used as video picture resources.

In addition, the implementation may also maintain opaque metadata associated with DPB slots, including:

Reference picture metadata corresponding to the video picture resource associated with the DPB slot.

Such metadata may be stored by the implementation as part of the DPB slot state maintained by the video session, or as part of the video picture resource backing the DPB slot.

Any metadata stored in the video picture resources backing DPB slots are independent of the video session used to store it, hence such video picture resources can be shared with other video sessions. Correspondingly, any metadata that is dependent on the video session will always be stored as part of the DPB slot state maintained by that video session.

The responsibility of managing the DPB is split between the application and the implementation as follows:

The application maintains the association between DPB slot indices and corresponding video picture resources.
The implementation maintains global and per-slot opaque reference picture metadata.

In addition, the application is also responsible for managing the mapping between the codec-specific picture IDs and DPB slots, and any other codec-specific states unless otherwise specified.

DPB Slot States

At a given time, each DPB slot is either in active or inactive state. Initially, all DPB slots managed by a video session are in inactive state.

A DPB slot can be activated by using it as the target of picture reconstruction in a video coding operation with the reconstructed picture requested to be set up as a reference picture, according to the codec-specific semantics, changing its state to active and associating it with a picture reference to the reconstructed pictures.

Some video coding standards allow multiple picture references to be associated with a single DPB slot. In this case the state of the individual picture references can be independently updated.

As an example, H.264 decoding allows associating a separate top field and bottom field picture with the same DPB slot.

As part of reference picture setup, the implementation may also generate reference picture metadata. Such reference picture metadata is specific to each picture reference associated with the DPB slot.

If such a video coding operation completes successfully, the activated DPB slot will have a valid picture reference and the reconstructed picture is associated with the DPB slot. This is true even if the DPB slot is used as the target of a picture reconstruction that only sets up a top field or bottom field reference picture and thus does not yet refer to a complete frame. However, if any data provided as input to such a video coding operation is not compliant with the video compression standard used, that video coding operation may complete unsuccessfully, in which case the activated DPB slot will have an invalid picture reference. This is true even if the DPB slot previously had a valid picture reference to a top field or bottom field reference picture, but the reconstruction of the other field corresponding to the DPB slot failed.

The application can use queries to get feedback about the outcome of video coding operations and use the resulting VkQueryResultStatusKHR value to determine whether the video coding operation completed successfully (result status is positive) or unsuccessfully (result status is negative).

Using a reference picture associated with a DPB slot that has an invalid picture reference as an active reference picture in subsequent video coding operations is legal, however, the contents of the outputs of such operations are undefined:, and any DPB slots activated by such video coding operations will also have an invalid picture reference. This is true even if such video coding operations may otherwise complete successfully.

A DPB slot can also be deactivated by the application, changing its state to inactive and invalidating any picture references and reference picture metadata associated with the DPB slot.

If an already active DPB slot is used as the target of picture reconstruction in a video coding operation, but the decoded picture is not requested to be set up as a reference picture, according to the codec-specific semantics, no reference picture setup happens and the corresponding picture reference and reference picture metadata is invalidated within the DPB slot. If the DPB slot no longer has any associated picture references after such an operation, the DPB slot is implicitly deactivated.

If an already active DPB slot is used as the target of picture reconstruction when decoding a field picture that is not marked as reference, then the behavior is as follows:

If the DPB slot is currently associated with a frame, then the DPB slot is deactivated.
If the DPB slot is not currently associated with a top field picture and the decoded picture is a top field picture, or if the DPB slot is not currently associated with a bottom field picture and the decoded picture is a bottom field picture, then the other field picture association of the DPB slot, if any, is not disturbed.
If the DPB slot is currently associated with a top field picture and the decoded picture is a top field picture, or if the DPB slot is currently associated with a bottom field picture and the decoded picture is a bottom field picture, then that picture association is invalidated, without disturbing the other field picture association, if any. If the DPB slot no longer has any associated picture references after such an operation, the DPB slot is implicitly deactivated.

A DPB slot can be activated with a new frame even if it is already active. In this case all previous associations of the DPB slots with reference pictures are replaced with an association with the reconstructed picture used to activate it.

If an already active DPB slot is activated with a reconstructed field picture, then the behavior is as follows:

If the DPB slot is currently associated with a frame, then that association is replaced with an association with the reconstructed field picture used to activate it.
If the DPB slot is not currently associated with a top field picture and the DPB slot is activated with a top field picture, or if the DPB slot is not currently associated with a bottom field picture and the DPB slot is activated with a bottom field picture, then the DPB slot is associated with the reconstructed field picture used to activate it, without disturbing the other field picture association, if any.
If the DPB slot is currently associated with a top field picture and the DPB slot is activated with a new top field picture, or if the DPB slot is currently associated with a bottom field picture and the DPB slot is activated with a new bottom field picture, then that association is replaced with an association with the reconstructed field picture used to activate it, without disturbing the other field picture association, if any.

Video Profiles

Chroma subsampling is described in more detail in the Chroma Reconstruction section.

Video Capabilities

Video Coding Capabilities

Video Format Capabilities

The list of supported video format properties for a set of image usage flags with respect to a video profile is defined as the list of VkVideoFormatPropertiesKHR structures and any structures included in its pNext chain, obtained by calling vkGetPhysicalDeviceVideoFormatPropertiesKHR with VkPhysicalDeviceVideoFormatInfoKHR::imageUsage equal to the VkImageUsageFlags in question and the VkPhysicalDeviceVideoFormatInfoKHR::pNext chain including a VkVideoProfileListInfoKHR structure with its pProfiles member containing a single array element specifying the VkVideoProfileInfoKHR structure chain describing the video profile in question.

Video Sessions

Creating a Video Session

Destroying a Video Session

Video Session Memory Association

After creating a video session object, and before the object can be used to record video coding operations into command buffers using it, the application must allocate and bind device memory to the video session. Device memory is allocated separately (see Device Memory) and then associated with the video session.

Video sessions may have multiple memory bindings identified by unique unsigned integer values. Appropriate device memory must be bound to each such memory binding before using the video session to record command buffer commands with it.

Video Profile Compatibility

Resources and query pools used with a particular video session must be compatible with the video profile the video session was created with.

A VkBuffer is compatible with a video profile if it was created with the VkBufferCreateInfo::pNext chain including a VkVideoProfileListInfoKHR structure with its pProfiles member containing an element matching the VkVideoProfileInfoKHR structure chain describing the video profile, and VkBufferCreateInfo::usage including at least one bit specific to video coding usage.

VK_BUFFER_USAGE_VIDEO_DECODE_SRC_BIT_KHR
VK_BUFFER_USAGE_VIDEO_DECODE_DST_BIT_KHR
VK_BUFFER_USAGE_VIDEO_ENCODE_SRC_BIT_KHR
VK_BUFFER_USAGE_VIDEO_ENCODE_DST_BIT_KHR

A VkBuffer is also compatible with a video profile if it was created with VkBufferCreateInfo::flags including VK_BUFFER_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR.

A VkImage is compatible with a video profile if it was created with the VkImageCreateInfo::pNext chain including a VkVideoProfileListInfoKHR structure with its pProfiles member containing an element matching the VkVideoProfileInfoKHR structure chain describing the video profile, and VkImageCreateInfo::usage including at least one bit specific to video coding usage.

VK_IMAGE_USAGE_VIDEO_DECODE_SRC_BIT_KHR
VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
VK_IMAGE_USAGE_VIDEO_ENCODE_SRC_BIT_KHR
VK_IMAGE_USAGE_VIDEO_ENCODE_DST_BIT_KHR
VK_IMAGE_USAGE_VIDEO_ENCODE_DPB_BIT_KHR
VK_IMAGE_USAGE_VIDEO_ENCODE_QUANTIZATION_DELTA_MAP_BIT_KHR
VK_IMAGE_USAGE_VIDEO_ENCODE_EMPHASIS_MAP_BIT_KHR

A VkImage is also compatible with a video profile if all of the following conditions are true for the VkImageCreateInfo structure the image was created with:

VkImageCreateInfo::flags included VK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR.
The list of VkVideoFormatPropertiesKHR structures, obtained by calling vkGetPhysicalDeviceVideoFormatPropertiesKHR with VkPhysicalDeviceVideoFormatInfoKHR::imageUsage equal to the VkImageCreateInfo::usage the image was created with and the VkPhysicalDeviceVideoFormatInfoKHR::pNext chain including a VkVideoProfileListInfoKHR structure with its pProfiles member containing a single array element specifying the VkVideoProfileInfoKHR structure chain describing the video profile in question, contains an element for which all of the following conditions are true with respect to the VkImageCreateInfo structure the image was created with:
- VkImageCreateInfo::format equals VkVideoFormatPropertiesKHR::format.
- VkImageCreateInfo::flags only contains VK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR and/or bits also set in VkVideoFormatPropertiesKHR::imageCreateFlags.
  Specifying VK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR when creating decode output pictures or encode input pictures is always supported when the videoMaintenance1 feature is enabled, regardless of the supported VkImageCreateFlags reported in VkVideoFormatPropertiesKHR::imageCreateFlags. Accordingly, implementations should not report VK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR in VkVideoFormatPropertiesKHR::imageCreateFlags for any video format.
- VkImageCreateInfo::imageType equals VkVideoFormatPropertiesKHR::imageType.
- VkImageCreateInfo::tiling equals VkVideoFormatPropertiesKHR::imageTiling.
- VkImageCreateInfo::usage only contains bits also set in VkVideoFormatPropertiesKHR::imageUsageFlags, or VkImageCreateInfo::flags includes VK_IMAGE_CREATE_EXTENDED_USAGE_BIT.

While some of these rules allow creating buffer or image resources that may be compatible with any video profile, applications should still prefer to include the specific video profiles the buffer or image resource is expected to be used with (through a VkVideoProfileListInfoKHR structure included in the pNext chain of the corresponding create info structure) whenever the information about the complete set of video profiles is available at resource creation time, to enable the implementation to optimize the created resource for the specific use case. In the absence of that information, the implementation may have to make conservative decisions about the memory requirements or representation of the resource.

A VkImageView is compatible with a video profile if the VkImage it was created from is also compatible with that video profile.

A VkQueryPool is compatible with a video profile if it was created with the VkQueryPoolCreateInfo::pNext chain including a VkVideoProfileInfoKHR structure chain describing the same video profile, and VkQueryPoolCreateInfo::queryType having one of the following values:

VK_QUERY_TYPE_RESULT_STATUS_ONLY_KHR
VK_QUERY_TYPE_VIDEO_ENCODE_FEEDBACK_KHR

Video Session Parameters

Video session parameters objects can store preprocessed codec-specific parameters used with a compatible video session, and enable reducing the number of parameters needed to be provided and processed by the implementation while recording video coding operations into command buffers.

Parameters stored in such objects are immutable to facilitate the concurrent use of the stored parameters in multiple threads. At the same time, new parameters can be added to existing objects using the vkUpdateVideoSessionParametersKHR command.

In order to support concurrent use of the stored immutable parameters while also allowing the video session parameters object to be extended with new parameters, each video session parameters object maintains an update sequence counter that is set to 0 at object creation time and must be incremented by each subsequent update operation.

Certain video sequences that adhere to particular video compression standards permit updating previously supplied parameters. If a parameter update is necessary, the application has the following options:

Cache the set of parameters on the application side and create a new video session parameters object adding all the parameters with appropriate changes, as necessary; or
Create a new video session parameters object providing only the updated parameters and the previously used object as the template, which ensures that parameters not specified at creation time will be copied unmodified from the template object.

The actual types of parameters that can be stored and the capacity for individual parameter types, and the methods of initializing, updating, and referring to individual parameters are specific to the video codec operation the video session parameters object was created with.

For VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR these are defined in the H.264 Decode Parameter Sets section.
For VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR these are defined in the H.265 Decode Parameter Sets section.
For VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR these are defined in the AV1 Decode Parameter Sets section.
For VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR these are defined in the H.264 Encode Parameter Sets section.
For VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR these are defined in the H.265 Encode Parameter Sets section.
For VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR these are defined in the AV1 Encode Parameter Sets section.

Video session parameters objects created with an encode operation are further specialized based on the video encode quality level the video session parameters are used with, as implementations may apply different sets of parameter overrides depending on the used quality level. This enables implementations to store the potentially optimized set of parameters in these objects, further limiting the necessary processing required while recording video encode operations into command buffers.

Creating Video Session Parameters

Destroying Video Session Parameters

Updating Video Session Parameters

Video Coding Scope

Applications can record video coding commands for a video session only within a video coding scope.

Video Coding Control

Inline Queries

If a video session was created with VK_VIDEO_SESSION_CREATE_INLINE_QUERIES_BIT_KHR, beginning queries using commands such as vkCmdBeginQuery within a video coding scope is not allowed. Instead, queries are executed inline by including an instance of the VkVideoInlineQueryInfoKHR structure in the pNext chain of the parameters of one of the video coding commands, with its queryPool member set to a valid VkQueryPool handle.

Video Decode Operations

Video decode operations consume compressed video data from a video bitstream buffer and zero or more reference pictures, and produce a decode output picture and an optional reconstructed picture.

Such decode output pictures can be shared with the Decoded Picture Buffer, and can also be used as the input of video encode operations, with graphics or compute operations, or with Window System Integration APIs, depending on the capabilities of the implementation.

Video decode operations may access the following resources in the VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR stage:

The source video bitstream buffer range and the image subregions corresponding to the list of active reference pictures with access VK_ACCESS_2_VIDEO_DECODE_READ_BIT_KHR.
The image subregions corresponding to the target decode output picture and reconstructed picture with access VK_ACCESS_2_VIDEO_DECODE_WRITE_BIT_KHR.

The image subresource of each video picture resource accessed by the video decode operation is specified using a corresponding VkVideoPictureResourceInfoKHR structure. Each such image subresource must be in the appropriate image layout as follows:

If the image subresource is used in the video decode operation only as decode output picture, then it must be in the VK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR layout.
If the image subresource is used in the video decode operation both as decode output picture and reconstructed picture, then it must be in the VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR layout.
If the image subresource is used in the video decode operation only as reconstructed picture, then it must be in the VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR layout.
If the image subresource is used in the video decode operation as a reference picture, then it must be in the VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR layout.

A video decode operation may complete unsuccessfully. In this case the decode output picture will have undefined: contents. Similarly, if reference picture setup is requested, the reconstructed picture will also have undefined: contents, and the activated DPB slot will have an invalid picture reference.

Codec-Specific Semantics

The following aspects of video decode operations are codec-specific:

The interpretation of the contents of the source video bitstream buffer range.
The construction and interpretation of the list of active reference pictures and the interpretation of the picture data referred to by the corresponding image subregions.
The construction and interpretation of information related to the decode output picture and the generation of picture data to the corresponding image subregion.
The decision on reference picture setup.
The construction and interpretation of information related to the optional reconstructed picture and the generation of picture data to the corresponding image subregion.

These codec-specific behaviors are defined for each video codec operation separately.

If the used video codec operation is VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR, then the codec-specific aspects of the video decoding process are performed as defined in the H.264 Decode Operations section.
If the used video codec operation is VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR, then the codec-specific aspects of the video decoding process are performed as defined in the H.265 Decode Operations section.
If the used video codec operation is VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR, then the codec-specific aspects of the video decoding process are performed as defined in the AV1 Decode Operations section.

Video Decode Operation Steps

Each video decode operation performs the following steps in the VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR stage:

Reads the encoded video data from the source video bitstream buffer range.
Performs picture reconstruction of the encoded video data according to the codec-specific semantics, applying any prediction data read from the active reference pictures in the process;
Writes the decoded picture data to the decode output picture, and optionally to the reconstructed picture, if one is specified and is different from the decode output picture, according to the codec-specific semantics;
If reference picture setup is requested, the DPB slot index specified in the reconstructed picture information is activated with the reconstructed picture.

When reconstructed picture information is provided, the specified DPB slot index is associated with the corresponding bound reference picture resource, indifferent of whether reference picture setup is requested.

Capabilities

Video Decode Commands

H.264 Decode Operations

Video decode operations using an H.264 decode profile can be used to decode elementary video stream sequences compliant to the ITU-T H.264 Specification.

Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.

This process is performed according to the video decode operation steps with the codec-specific semantics defined in section 8 of the ITU-T H.264 Specification as follows:

Syntax elements, derived values, and other parameters are applied from the following structures:
- The StdVideoH264SequenceParameterSet structure corresponding to the active SPS specifying the H.264 sequence parameter set.
- The StdVideoH264PictureParameterSet structure corresponding to the active PPS specifying the H.264 picture parameter set.
- The StdVideoDecodeH264PictureInfo structure specifying the H.264 picture information.
- The StdVideoDecodeH264ReferenceInfo structures specifying the H.264 reference information corresponding to the optional reconstructed picture and any active reference pictures.
The contents of the provided video bitstream buffer range are interpreted as defined in the H.264 Decode Bitstream Data Access section.
Picture data in the video picture resources corresponding to the used active reference pictures, decode output picture, and optional reconstructed picture is accessed as defined in the H.264 Decode Picture Data Access section.
The decision on reference picture setup is made according to the parameters specified in the H.264 picture information.

If the parameters and the bitstream adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.264 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video decode operation will complete successfully. Otherwise, the video decode operation may complete unsuccessfully.

H.264 Decode Bitstream Data Access

If the target decode output picture is a frame, then the video bitstream buffer range should contain a VCL NAL unit comprised of the slice headers and data of a picture representing an entire frame, as defined in sections 7.3.3 and 7.3.4, and this data is interpreted as defined in sections 7.4.3 and 7.4.4 of the ITU-T H.264 Specification, respectively.

If the target decode output picture is a field, then the video bitstream buffer range should contain a VCL NAL unit comprised of the slice headers and data of a picture representing a field, as defined in sections 7.3.3 and 7.3.4, and this data is interpreted as defined in sections 7.4.3 and 7.4.4 of the ITU-T H.264 Specification, respectively.

The offsets provided in VkVideoDecodeH264PictureInfoKHR::pSliceOffsets should specify the starting offsets corresponding to each slice header within the video bitstream buffer range.

H.264 Decode Picture Data Access

The effective imageOffset and imageExtent corresponding to a decode output picture, reference picture, or reconstructed picture used in video decode operations with an H.264 decode profile are defined as follows:

imageOffset is (codedOffset.x,codedOffset.y) and imageExtent is (codedExtent.width, codedExtent.height), if the picture represents a frame.
imageOffset is (codedOffset.x,codedOffset.y) and imageExtent is (codedExtent.width, codedExtent.height), if the picture represents a field and the picture layout of the used H.264 decode profile is VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR.
imageOffset is (codedOffset.x,codedOffset.y) and imageExtent is (codedExtent.width, codedExtent.height / 2), if the picture represents a field and the picture layout of the used H.264 decode profile is VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR.

Where codedOffset and codedExtent are the members of the VkVideoPictureResourceInfoKHR structure corresponding to the picture.

However, accesses to image data within a video picture resource happen at the granularity indicated by VkVideoCapabilitiesKHR::pictureAccessGranularity, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile. This means that the complete image subregion accessed by video coding operations using an H.264 decode profile for the video picture resource is defined as the set of texels within the coordinate range:

([startX,endX), [startY,endY))

Where:

startX equals imageOffset.x rounded down to the nearest integer multiple of pictureAccessGranularity.width;
endX equals imageOffset.x + imageExtent.width rounded up to the nearest integer multiple of pictureAccessGranularity.width and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
startY equals imageOffset.y rounded down to the nearest integer multiple of pictureAccessGranularity.height;
endY equals imageOffset.y + imageExtent.height rounded up to the nearest integer multiple of pictureAccessGranularity.height and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure.

In case of video decode operations using an H.264 decode profile, any access to a picture at the coordinates (x,y), as defined by the ITU-T H.264 Specification, is an access to the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure at the texel coordinates specified below:

(x,y), if the accessed picture represents a frame.
(x,y × 2), if the accessed picture represents a top field and the picture layout of the used H.264 decode profile is VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR.
(x,y × 2 + 1), if the accessed picture represents a bottom field and the picture layout of the used H.264 decode profile is VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR.
(x,y), if the accessed picture represents a top field and the picture layout of the used H.264 decode profile is VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR.
(codedOffset.x + x,codedOffset.y + y), if the accessed picture represents a bottom field and the picture layout of the used H.264 decode profile is VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR.

Where codedOffset is the member of the corresponding VkVideoPictureResourceInfoKHR structure.

H.264 Decode Profile

H.264 Decode Capabilities

H.264 Decode Parameter Sets

Video session parameters objects created with the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHRcan contain the following types of parameters:

H.264 Sequence Parameter Sets (SPS)

Represented by StdVideoH264SequenceParameterSet structures and interpreted as follows:

reserved1 and reserved2 are used only for padding purposes and are otherwise ignored;
seq_parameter_set_id is used as the key of the SPS entry;
level_idc is one of the enum constants STD_VIDEO_H264_LEVEL_IDC_<major>_<minor> identifying the H.264 level <major>.<minor> as defined in section A.3 of the ITU-T H.264 Specification;
if flags.seq_scaling_matrix_present_flag is set, then the StdVideoH264ScalingLists structure pointed to by pScalingLists is interpreted as follows:
- scaling_list_present_mask is a bitmask where bit index i corresponds to seq_scaling_list_present_flag[i] as defined in section 7.4.2.1 of the ITU-T H.264 Specification;
- use_default_scaling_matrix_mask is a bitmask where bit index i corresponds to UseDefaultScalingMatrix4x4Flag[i], when i < 6, or corresponds to UseDefaultScalingMatrix8x8Flag[i-6], otherwise, as defined in section 7.3.2.1 of the ITU-T H.264 Specification;
- ScalingList4x4 and ScalingList8x8 correspond to the identically named syntax elements defined in section 7.3.2.1 of the ITU-T H.264 Specification;
if flags.vui_parameters_present_flag is set, then pSequenceParameterSetVui is a pointer to a StdVideoH264SequenceParameterSetVui structure that is interpreted as follows:
- reserved1 is used only for padding purposes and is otherwise ignored;
- flags.color_description_present_flag is interpreted as the value of colour_description_present_flag, as defined in section E.2.1 of the ITU-T H.264 Specification;
  The name of colour_description_present_flag was misspelled in the Video Std header.
- if flags.nal_hrd_parameters_present_flag or flags.vcl_hrd_parameters_present_flag is set, then the StdVideoH264HrdParameters structure pointed to by pHrdParameters is interpreted as follows:
  - reserved1 is used only for padding purposes and is otherwise ignored;
  - all other members of StdVideoH264HrdParameters are interpreted as defined in section E.2.2 of the ITU-T H.264 Specification;
- all other members of StdVideoH264SequenceParameterSetVui are interpreted as defined in section E.2.1 of the ITU-T H.264 Specification;
all other members of StdVideoH264SequenceParameterSet are interpreted as defined in section 7.4.2.1 of the ITU-T H.264 Specification.

H.264 Picture Parameter Sets (PPS)

Represented by StdVideoH264PictureParameterSet structures and interpreted as follows:

the pair constructed from seq_parameter_set_id and pic_parameter_set_id is used as the key of the PPS entry;
if flags.pic_scaling_matrix_present_flag is set, then the StdVideoH264ScalingLists structure pointed to by pScalingLists is interpreted as follows:
- scaling_list_present_mask is a bitmask where bit index i corresponds to pic_scaling_list_present_flag[i] as defined in section 7.4.2.2 of the ITU-T H.264 Specification;
- use_default_scaling_matrix_mask is a bitmask where bit index i corresponds to UseDefaultScalingMatrix4x4Flag[i], when i < 6, or corresponds to UseDefaultScalingMatrix8x8Flag[i-6], otherwise, as defined in section 7.3.2.2 of the ITU-T H.264 Specification;
- ScalingList4x4 and ScalingList8x8 correspond to the identically named syntax elements defined in section 7.3.2.2 of the ITU-T H.264 Specification;
all other members of StdVideoH264PictureParameterSet are interpreted as defined in section 7.4.2.2 of the ITU-T H.264 Specification.

Inline Parameter Sets

In case of video sessions created with the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR and with VK_VIDEO_SESSION_CREATE_INLINE_SESSION_PARAMETERS_BIT_KHR, the application can also specify the active parameter sets inline by including an instance of the VkVideoDecodeH264InlineSessionParametersInfoKHR structure in the pNext chain of VkVideoDecodeInfoKHR.

H.264 Decoding Parameters

H.264 Decode Requirements

This section describes the required H.264 decoding capabilities for physical devices that have at least one queue family that supports the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 in VkQueueFamilyVideoPropertiesKHR::videoCodecOperations.

Table 53. Required Video Std Header Versions
Video Std Header Name	Version
vulkan_video_codec_h264std_decode	1.0.0

Table 54. Required Video Capabilities
Video Capability	Requirement	Requirement Type1
VkVideoCapabilitiesKHR
flags	-	min
minBitstreamBufferOffsetAlignment	4096	max
minBitstreamBufferSizeAlignment	4096	max
pictureAccessGranularity	(64,64)	max
minCodedExtent	-	max
maxCodedExtent	-	min
maxDpbSlots	0	min
maxActiveReferencePictures	0	min
VkVideoDecodeCapabilitiesKHR
flags	VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR or VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR	min
VkVideoDecodeH264CapabilitiesKHR
maxLevelIdc	STD_VIDEO_H264_LEVEL_IDC_1_0	min
fieldOffsetGranularity	(0,0) except for profiles using VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR	implementation-dependent

H.265 Decode Operations

Video decode operations using an H.265 decode profile can be used to decode elementary video stream sequences compliant to the ITU-T H.265 Specification.

Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.

This process is performed according to the video decode operation steps with the codec-specific semantics defined in section 8 of ITU-T H.265 Specification:

Syntax elements, derived values, and other parameters are applied from the following structures:
- The StdVideoH265VideoParameterSet structure corresponding to the active VPS specifying the H.265 video parameter set.
- The StdVideoH265SequenceParameterSet structure corresponding to the active SPS specifying the H.265 sequence parameter set.
- The StdVideoH265PictureParameterSet structure corresponding to the active PPS specifying the H.265 picture parameter set.
- The StdVideoDecodeH265PictureInfo structure specifying the H.265 picture information.
- The StdVideoDecodeH265ReferenceInfo structures specifying the H.265 reference information corresponding to the optional reconstructed picture and any active reference pictures.
The contents of the provided video bitstream buffer range are interpreted as defined in the H.265 Decode Bitstream Data Access section.
Picture data in the video picture resources corresponding to the used active reference pictures, decode output picture, and optional reconstructed picture is accessed as defined in the H.265 Decode Picture Data Access section.
The decision on reference picture setup is made according to the parameters specified in the H.265 picture information.

If the parameters and the bitstream adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.265 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video decode operation will complete successfully. Otherwise, the video decode operation may complete unsuccessfully.

H.265 Decode Bitstream Data Access

The video bitstream buffer range should contain a VCL NAL unit comprised of the slice segment headers and data of a picture representing a frame, as defined in sections 7.3.6 and 7.3.8, and this data is interpreted as defined in sections 7.4.7 and 7.4.9 of the ITU-T H.265 Specification, respectively.

The offsets provided in VkVideoDecodeH265PictureInfoKHR::pSliceSegmentOffsets should specify the starting offsets corresponding to each slice segment header within the video bitstream buffer range.

H.265 Decode Picture Data Access

Accesses to image data within a video picture resource happen at the granularity indicated by VkVideoCapabilitiesKHR::pictureAccessGranularity, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile. Accordingly, the complete image subregion of a decode output picture, reference picture, or reconstructed picture accessed by video coding operations using an H.265 decode profile is defined as the set of texels within the coordinate range:

([0,endX), [0,endY))

Where:

endX equals codedExtent.width rounded up to the nearest integer multiple of pictureAccessGranularity.width and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
endY equals codedExtent.height rounded up to the nearest integer multiple of pictureAccessGranularity.height and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;

Where codedExtent is the member of the VkVideoPictureResourceInfoKHR structure corresponding to the picture.

In case of video decode operations using an H.265 decode profile, any access to a picture at the coordinates (x,y), as defined by the ITU-T H.265 Specification, is an access to the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure at the texel coordinates (x,y).

H.265 Decode Profile

H.265 Decode Capabilities

H.265 Decode Parameter Sets

Video session parameters objects created with the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHRcan contain the following types of parameters:

H.265 Video Parameter Sets (VPS)

Represented by StdVideoH265VideoParameterSet structures and interpreted as follows:

reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
vps_video_parameter_set_id is used as the key of the VPS entry;
the max_latency_increase_plus1, max_dec_pic_buffering_minus1, and max_num_reorder_pics members of the StdVideoH265DecPicBufMgr structure pointed to by pDecPicBufMgr correspond to vps_max_latency_increase_plus1, vps_max_dec_pic_buffering_minus1, and vps_max_num_reorder_pics, respectively, as defined in section 7.4.3.1 of the ITU-T H.265 Specification;
the StdVideoH265HrdParameters structure pointed to by pHrdParameters is interpreted as follows:
- reserved is used only for padding purposes and is otherwise ignored;
- flags.fixed_pic_rate_general_flag is a bitmask where bit index i corresponds to fixed_pic_rate_general_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
- flags.fixed_pic_rate_within_cvs_flag is a bitmask where bit index i corresponds to fixed_pic_rate_within_cvs_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
- flags.low_delay_hrd_flag is a bitmask where bit index i corresponds to low_delay_hrd_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
- if flags.nal_hrd_parameters_present_flag is set, then pSubLayerHrdParametersNal is a pointer to an array of vps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where vps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265VideoParameterSet structure and each element is interpreted as follows:
  - cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
  - all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
- if flags.vcl_hrd_parameters_present_flag is set, then pSubLayerHrdParametersVcl is a pointer to an array of vps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where vps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265VideoParameterSet structure and each element is interpreted as follows:
  - cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
  - all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
- all other members of StdVideoH265HrdParameters are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
the StdVideoH265ProfileTierLevel structure pointed to by pProfileTierLevel are interpreted as follows:
- general_level_idc is one of the enum constants STD_VIDEO_H265_LEVEL_IDC_<major>_<minor> identifying the H.265 level <major>.<minor> as defined in section A.4 of the ITU-T H.265 Specification;
- all other members of StdVideoH265ProfileTierLevel are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
all other members of StdVideoH265VideoParameterSet are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.

H.265 Sequence Parameter Sets (SPS)

Represented by StdVideoH265SequenceParameterSet structures and interpreted as follows:

reserved1 and reserved2 are used only for padding purposes and are otherwise ignored;
the pair constructed from sps_video_parameter_set_id and sps_seq_parameter_set_id is used as the key of the SPS entry;
the StdVideoH265ProfileTierLevel structure pointed to by pProfileTierLevel are interpreted as follows:
- general_level_idc is one of the enum constants STD_VIDEO_H265_LEVEL_IDC_<major>_<minor> identifying the H.265 level <major>.<minor> as defined in section A.4 of the ITU-T H.265 Specification;
- all other members of StdVideoH265ProfileTierLevel are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
the max_latency_increase_plus1, max_dec_pic_buffering_minus1, and max_num_reorder_pics members of the StdVideoH265DecPicBufMgr structure pointed to by pDecPicBufMgr correspond to sps_max_latency_increase_plus1, sps_max_dec_pic_buffering_minus1, and sps_max_num_reorder_pics, respectively, as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
if flags.sps_scaling_list_data_present_flag is set, then the StdVideoH265ScalingLists structure pointed to by pScalingLists is interpreted as follows:
- ScalingList4x4, ScalingList8x8, ScalingList16x16, and ScalingList32x32 correspond to ScalingList[0], ScalingList[1], ScalingList[2], and ScalingList[3], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
- ScalingListDCCoef16x16 and ScalingListDCCoef32x32 correspond to scaling_list_dc_coef_minus8[0] and scaling_list_dc_coef_minus8[1], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
pShortTermRefPicSet is a pointer to an array of num_short_term_ref_pic_sets number of StdVideoH265ShortTermRefPicSet structures where each element is interpreted as follows:
- reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
- used_by_curr_pic_flag is a bitmask where bit index i corresponds to used_by_curr_pic_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
- use_delta_flag is a bitmask where bit index i corresponds to use_delta_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
- used_by_curr_pic_s0_flag is a bitmask where bit index i corresponds to used_by_curr_pic_s0_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
- used_by_curr_pic_s1_flag is a bitmask where bit index i corresponds to used_by_curr_pic_s1_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
- all other members of StdVideoH265ShortTermRefPicSet are interpreted as defined in section 7.4.8 of the ITU-T H.265 Specification;
if flags.long_term_ref_pics_present_flag is set then the StdVideoH265LongTermRefPicsSps structure pointed to by pLongTermRefPicsSps is interpreted as follows:
- used_by_curr_pic_lt_sps_flag is a bitmask where bit index i corresponds to used_by_curr_pic_lt_sps_flag[i] as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
- all other members of StdVideoH265LongTermRefPicsSps are interpreted as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
if flags.vui_parameters_present_flag is set, then the StdVideoH265SequenceParameterSetVui structure pointed to by pSequenceParameterSetVui is interpreted as follows:
- reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
- the StdVideoH265HrdParameters structure pointed to by pHrdParameters is interpreted as follows:
  - flags.fixed_pic_rate_general_flag is a bitmask where bit index i corresponds to fixed_pic_rate_general_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
  - flags.fixed_pic_rate_within_cvs_flag is a bitmask where bit index i corresponds to fixed_pic_rate_within_cvs_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
  - flags.low_delay_hrd_flag is a bitmask where bit index i corresponds to low_delay_hrd_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
  - if flags.nal_hrd_parameters_present_flag is set, then pSubLayerHrdParametersNal is a pointer to an array of sps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where sps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265SequenceParameterSet structure and each element is interpreted as follows:
    - cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
    - all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
  - if flags.vcl_hrd_parameters_present_flag is set, then pSubLayerHrdParametersVcl is a pointer to an array of sps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where sps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265SequenceParameterSet structure and each element is interpreted as follows:
    - cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
    - all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
  - all other members of StdVideoH265HrdParameters are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
- all other members of pSequenceParameterSetVui are interpreted as defined in section E.3.1 of the ITU-T H.265 Specification;
if flags.sps_palette_predictor_initializer_present_flag is set, then the PredictorPaletteEntries member of the StdVideoH265PredictorPaletteEntries structure pointed to by pPredictorPaletteEntries is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification;
all other members of StdVideoH265SequenceParameterSet are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.

H.265 Picture Parameter Sets (PPS)

Represented by StdVideoH265PictureParameterSet structures and interpreted as follows:

reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
the triplet constructed from sps_video_parameter_set_id, pps_seq_parameter_set_id, and pps_pic_parameter_set_id is used as the key of the PPS entry;
if flags.pps_scaling_list_data_present_flag is set, then the StdVideoH265ScalingLists structure pointed to by pScalingLists is interpreted as follows:
- ScalingList4x4, ScalingList8x8, ScalingList16x16, and ScalingList32x32 correspond to ScalingList[0], ScalingList[1], ScalingList[2], and ScalingList[3], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
- ScalingListDCCoef16x16 and ScalingListDCCoef32x32 correspond to scaling_list_dc_coef_minus8[0] and scaling_list_dc_coef_minus8[1], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
if flags.pps_palette_predictor_initializer_present_flag is set, then the PredictorPaletteEntries member of the StdVideoH265PredictorPaletteEntries structure pointed to by pPredictorPaletteEntries is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification;
all other members of StdVideoH265PictureParameterSet are interpreted as defined in section 7.4.3.3 of the ITU-T H.265 Specification.

Inline Parameter Sets

In case of video sessions created with the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR and with VK_VIDEO_SESSION_CREATE_INLINE_SESSION_PARAMETERS_BIT_KHR, the application can also specify the active parameter sets inline by including an instance of the VkVideoDecodeH265InlineSessionParametersInfoKHR structure in the pNext chain of VkVideoDecodeInfoKHR.

H.265 Decoding Parameters

H.265 Decode Requirements

This section describes the required H.265 decoding capabilities for physical devices that have at least one queue family that supports the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 in VkQueueFamilyVideoPropertiesKHR::videoCodecOperations.

Table 55. Required Video Std Header Versions
Video Std Header Name	Version
vulkan_video_codec_h265std_decode	1.0.0

Table 56. Required Video Capabilities
Video Capability	Requirement	Requirement Type1
VkVideoCapabilitiesKHR
flags	-	min
minBitstreamBufferOffsetAlignment	4096	max
minBitstreamBufferSizeAlignment	4096	max
pictureAccessGranularity	(64,64)	max
minCodedExtent	-	max
maxCodedExtent	-	min
maxDpbSlots	0	min
maxActiveReferencePictures	0	min
VkVideoDecodeCapabilitiesKHR
flags	VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR or VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR	min
VkVideoDecodeH265CapabilitiesKHR
maxLevelIdc	STD_VIDEO_H265_LEVEL_IDC_1_0	min

AV1 Decode Operations

Video decode operations using an AV1 decode profilecan be used to decode elementary video stream sequences compliant with the AV1 Specification.

Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.

This process is performed according to the video decode operation steps with the codec-specific semantics defined in section 7 of the AV1 Specification:

Syntax elements, derived values, and other parameters are applied from the following structures:
- The StdVideoAV1SequenceHeader structure stored in the bound video session parameters object specifying the active sequence header.
- The StdVideoDecodeAV1PictureInfo structure specifying the AV1 picture information.
- The StdVideoDecodeAV1ReferenceInfo structures specifying the AV1 reference information corresponding to the optional reconstructed picture and any active reference pictures.
The contents of the provided video bitstream buffer range are interpreted as defined in the AV1 Decode Bitstream Data Access section.
Picture data in the video picture resources corresponding to the used active reference pictures, decode output picture, and optional reconstructed picture is accessed as defined in the AV1 Decode Picture Data Access section.
The decision on reference picture setup is made according to the parameters specified in the AV1 picture information.

If the parameters and the bitstream adhere to the syntactic and semantic requirements defined in the corresponding sections of the AV1 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video decode operation will complete successfully. Otherwise, the video decode operation may complete unsuccessfully.

AV1 Decode Bitstream Data Access

The video bitstream buffer range should contain one or more frame OBUs, comprised of a frame header OBU and tile group OBU, that together represent an entire frame, as defined in sections 5.10, 5.9, and 5.11, and this data is interpreted as defined in sections 6.9, 6.8, and 6.10 of the AV1 Specification, respectively.

The offset specified in VkVideoDecodeAV1PictureInfoKHR::frameHeaderOffset should specify the starting offset of the frame header OBU of the frame.

When the tiles of the frame are encoded into multiple tile groups, each frame OBU has a separate frame header OBU but their content is expected to match per the requirements of the AV1 Specification. Accordingly, the offset specified in frameHeaderOffset can be the offset of any of the otherwise identical frame header OBUs when multiple tile groups are present.

The offsets and sizes provided in VkVideoDecodeAV1PictureInfoKHR::pTileOffsets and VkVideoDecodeAV1PictureInfoKHR::pTileSizes, respectively, should specify the starting offsets and sizes corresponding to each tile within the video bitstream buffer range.

AV1 Decode Picture Data Access

Accesses to image data within a video picture resource happen at the granularity indicated by VkVideoCapabilitiesKHR::pictureAccessGranularity, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile. Accordingly, the complete image subregion of a decode output picture, reference picture, or reconstructed picture accessed by video coding operations using an AV1 decode profile is defined as the set of texels within the coordinate range:

([0,endX), [0,endY))

Where:

endX equals codedExtent.width rounded up to the nearest integer multiple of pictureAccessGranularity.width and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
endY equals codedExtent.height rounded up to the nearest integer multiple of pictureAccessGranularity.height and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;

Where codedExtent is the member of the VkVideoPictureResourceInfoKHR structure corresponding to the picture.

In case of video decode operations using an AV1 decode profile, any access to a picture at the coordinates (x,y), as defined by the AV1 Specification, is an access to the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure at the texel coordinates (x,y).

AV1 Reference Names and Semantics

Individual reference frames used in the decoding process have different semantics, as defined in section 6.10.24 of the AV1 Specification. The AV1 semantics associated with a reference picture are indicated by the corresponding enumeration constant defined in the Video Std enumeration type StdVideoAV1ReferenceName:

STD_VIDEO_AV1_REFERENCE_NAME_INTRA_FRAME identifies the reference used for intra coding (INTRA_FRAME), as defined in sections 2 and 7.11.2 of the AV1 Specification.
All other enumeration constants refer to backward or forward references used for inter coding, as defined in sections 2 and 7.11.3 of the AV1 Specification:
- STD_VIDEO_AV1_REFERENCE_NAME_LAST_FRAME identifies the LAST_FRAME reference
- STD_VIDEO_AV1_REFERENCE_NAME_LAST2_FRAME identifies the LAST2_FRAME reference
- STD_VIDEO_AV1_REFERENCE_NAME_LAST3_FRAME identifies the LAST3_FRAME reference
- STD_VIDEO_AV1_REFERENCE_NAME_GOLDEN_FRAME identifies the GOLDEN_FRAME reference
- STD_VIDEO_AV1_REFERENCE_NAME_BWDREF_FRAME identifies the BWDREF_FRAME reference
- STD_VIDEO_AV1_REFERENCE_NAME_ALTREF2_FRAME identifies the ALTREF2_FRAME reference
- STD_VIDEO_AV1_REFERENCE_NAME_ALTREF_FRAME identifies the ALTREF_FRAME reference

These enumeration constants are not directly used in any APIs but are used to indirectly index into certain Video Std and Vulkan API parameter arrays.

AV1 Decode Profile

AV1 Decode Capabilities

AV1 Decode Parameter Sets

Video session parameters objects created with the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR contain a single instance of the following parameter set:

AV1 Sequence Header

Represented by StdVideoAV1SequenceHeader structures and interpreted as follows:

flags.reserved and reserved1 are used only for padding purposes and are otherwise ignored;
the StdVideoAV1ColorConfig structure pointed to by pColorConfig is interpreted as follows:
- flags.reserved and reserved1 are used only for padding purposes and are otherwise ignored;
- all other members of StdVideoAV1ColorConfig are interpreted as defined in section 6.4.2 of the AV1 Specification;
if flags.timing_info_present_flag is set, then the StdVideoAV1TimingInfo structure pointed to by pTimingInfo is interpreted as follows:
- flags.reserved is used only for padding purposes and is otherwise ignored;
- all other members of StdVideoAV1TimingInfo are interpreted as defined in section 6.4.3 of the AV1 Specification;
all other members of StdVideoAV1SequenceHeader are interpreted as defined in section 6.4 of the AV1 Specification.

Inline Parameter Sets

In case of video sessions created with the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR and with VK_VIDEO_SESSION_CREATE_INLINE_SESSION_PARAMETERS_BIT_KHR, the application can also specify the active sequence header inline by including an instance of the VkVideoDecodeAV1InlineSessionParametersInfoKHR structure in the pNext chain of VkVideoDecodeInfoKHR.

AV1 Decoding Parameters

AV1 Decode Requirements

This section describes the required AV1 decoding capabilities for physical devices that have at least one queue family that supports the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 in VkQueueFamilyVideoPropertiesKHR::videoCodecOperations.

Table 57. Required Video Std Header Versions
Video Std Header Name	Version
vulkan_video_codec_av1std_decode	1.0.0

Table 58. Required Video Capabilities
Video Capability	Requirement	Requirement Type1
VkVideoCapabilitiesKHR
flags	-	min
minBitstreamBufferOffsetAlignment	4096	max
minBitstreamBufferSizeAlignment	4096	max
pictureAccessGranularity	(64,64)	max
minCodedExtent	-	max
maxCodedExtent	-	min
maxDpbSlots	0	min
maxActiveReferencePictures	0	min
VkVideoDecodeCapabilitiesKHR
flags	VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR or VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR	min
VkVideoDecodeAV1CapabilitiesKHR
maxLevel	STD_VIDEO_AV1_LEVEL_2_0	min

Video Encode Operations

Video encode operations consume an encode input picture and zero or more reference pictures, and produce compressed video data to a video bitstream buffer and an optional reconstructed picture.

Such encode input pictures can be used as the output of video decode operations, with graphics or compute operations, or with Window System Integration APIs, depending on the capabilities of the implementation.

Video encode operations may access the following resources in the VK_PIPELINE_STAGE_2_VIDEO_ENCODE_BIT_KHR stage:

The image subregions corresponding to the source encode input picture and active reference pictures with access VK_ACCESS_2_VIDEO_ENCODE_READ_BIT_KHR.
The destination video bitstream buffer range and the optional reconstructed picture with access VK_ACCESS_2_VIDEO_ENCODE_WRITE_BIT_KHR.

The image subresource of each video picture resource accessed by the video encode operation is specified using a corresponding VkVideoPictureResourceInfoKHR structure. Each such image subresource must be in the appropriate image layout as follows:

If the image subresource is used in the video encode operation as an encode input picture, then it must be in the VK_IMAGE_LAYOUT_VIDEO_ENCODE_SRC_KHR layout.
If the image subresource is used in the video encode operation as a reconstructed picture or reference picture, then it must be in the VK_IMAGE_LAYOUT_VIDEO_ENCODE_DPB_KHR layout.
If the image subresource is used in the video encode operation as a quantization map, then it must be in the VK_IMAGE_LAYOUT_VIDEO_ENCODE_QUANTIZATION_MAP_KHR layout.

A video encode operation may complete unsuccessfully. In this case the target video bitstream buffer will have undefined: contents. Similarly, if reference picture setup is requested, the reconstructed-picture will also have undefined: contents, and the activated DPB slot will have an invalid picture reference.

If a video encode operation completes successfully and the codec-specific parameters provided by the application adhere to the syntactic and semantic requirements defined in the corresponding video compression standard, then the target video bitstream buffer will contain compressed video data after the execution of the video encode operation according to the respective codec-specific semantics.

Codec-Specific Semantics

The following aspects of video encode operations are codec-specific:

The compressed video data written to the target video bitstream buffer range.
The construction and interpretation of the list of active reference pictures and the interpretation of the picture data referred to by the corresponding image subregions.
The construction and interpretation of information related to the encode input picture and the interpretation of the picture data referred to by the corresponding image subregion.
The decision on reference picture setup.
The construction and interpretation of information related to the optional reconstructed picture and the generation of picture data to the corresponding image subregion.
Certain aspects of rate control.

These codec-specific behaviors are defined for each video codec operation separately.

If the used video codec operation is VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR, then the codec-specific aspects of the video encoding process are performed as defined in the H.264 Encode Operations section.
If the used video codec operation is VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR, then the codec-specific aspects of the video encoding process are performed as defined in the H.265 Encode Operations section.
If the used video codec operation is VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR, then the codec-specific aspects of the video encoding process are performed as defined in the AV1 Encode Operations section.

Video Encode Parameter Overrides

Implementations supporting video encode operations for any particular video codec operation often support only a subset of the available encoding tools defined by the corresponding video compression standards. Accordingly, certain implementation-dependent limitations may apply to codec-specific parameters provided through the structures defined in the Video Std headers corresponding to the used video codec operation.

Exposing all of these restrictions on particular codec-specific parameter values or combinations thereof in the form of application-queryable capabilities is impractical, hence this specification allows implementations to override the value of any of the codec-specific parameters, unless otherwise specified, as long as all of the following conditions are met:

If the application-provided codec-specific parameters adhere to the syntactic and semantic requirements and rules defined by the used video compression standard, and thus would be usable to produce a video bitstream compliant with that standard, then the codec-specific parameters resulting from the process of implementation overrides must also adhere to the same requirements and rules, and any video bitstream produced using the overridden parameters must also be compliant.
The overridden codec-specific parameter values must not have an impact on the codec-independent behaviors defined for video encode operations.
The implementation must not override any codec-specific parameters specified to a command that may cause application-provided codec-specific parameters specified to subsequent commands to no longer adhere to the semantic requirements and rules defined by the used video compression standard, unless the implementation also overrides those parameters to adhere to any such requirements and rules.
The overridden codec-specific parameter values must not have an impact on the codec-specific picture data access semantics.
The overridden codec-specific parameter values may change the contents of the codec-specific bitstream elements produced by video encode operations or otherwise retrieved by the application (e.g. using the vkGetEncodedVideoSessionParametersKHR command) but must still adhere to the codec-specific semantics defined for that video codec operation, including, but not limited to, the number, type, and order of the encoded codec-specific bitstream elements.

Besides codec-specific parameter overrides performed for implementation-dependent reasons, applications can enable the implementation to apply additional optimizing overrides that may improve the efficiency or performance of video encoding operations. However, implementations must meet the conditions listed above even in case of such optimizing overrides.

Unless the application opts in for optimizing overrides, implementations are not expected to override any of the codec-specific parameters, except when such overrides are necessary for the correct operation of video encoder implementation due to limitations to the available encoding tools on that implementation.

Video Encode Operation Steps

Each video encode operation performs the following steps in the VK_PIPELINE_STAGE_2_VIDEO_ENCODE_BIT_KHR stage:

Reads the input picture data from the encode input picture;
Determine derived encoding quality parameters according to the codec-specific semantics and the current rate control state;
Compresses the input picture data according to the codec-specific semantics, applying any prediction data read from the active reference pictures and rate control restrictions in the process;
Writes the encoded bitstream data to the destination video bitstream buffer range;
Performs picture reconstruction of the encoded video data according to the codec-specific semantics, applying any prediction data read from the active reference pictures in the process, if a reconstructed picture is specified and reference picture setup is requested;
If reference picture setup is requested, the DPB slot index specified in the reconstructed picture information is activated with the reconstructed picture;
Writes the reconstructed picture data to the reconstructed picture, if one is specified, according to the codec-specific semantics.

Capabilities

VK_VIDEO_ENCODE_CAPABILITY_QUANTIZATION_DELTA_MAP_BIT_KHR indicates support for using quantization delta maps.
VK_VIDEO_ENCODE_CAPABILITY_EMPHASIS_MAP_BIT_KHR specifies support for using emphasis maps.

Video Encode Quality Levels

Implementations can support more than one video encode quality levels for a video encode profile, which control the number and type of implementation-specific encoding tools and algorithms utilized in the encoding process.

Generally, using higher video encode quality levels may produce higher quality video streams at the cost of additional processing time. However, as the final quality of an encoded picture depends on the contents of the encode input picture, the contents of the active reference pictures, the codec-specific encode parameters, and the particular implementation-specific tools used corresponding to the individual video encode quality levels, there are no guarantees that using a higher video encode quality level will always produce a higher quality encoded picture for any given set of inputs.

Retrieving Encoded Session Parameters

Any codec-specific parameters stored in video session parameters objects may need to be separately encoded and included in the final video bitstream data, depending on the used video compression standard. In such cases the application must call the vkGetEncodedVideoSessionParametersKHR command to retrieve the encoded parameter data from the used video session parameters object in order to be able to produce a compliant video bitstream.

This is needed because implementations may have changed some of the codec-specific parameters stored in the video session parameters object, as defined in the Video Encode Parameter Overrides section. In addition, the vkGetEncodedVideoSessionParametersKHR command enables the application to retrieve the encoded parameter data without having to encode these codec-specific parameters manually.

Video Encode Commands

Video Encode Rate Control

The size of the encoded bitstream data produced by video encode operations is a function of the following set of constraints:

The capabilities of the compression algorithms defined and employed by the used video compression standard;
Restrictions imposed by the selected video profile according to the rules defined by the used video compression standard;
Further restrictions imposed by the capabilities supported by the implementation for the selected video profile;
The image data in the encode input picture and the set of active reference pictures (as these affect the effectiveness of the compression algorithms employed by the video encode operations);
The set of codec-specific and codec-independent encoding parameters provided by the application.

These also inherently define the set of decoder capabilities required for reconstructing and processing the picture data in the encoded bitstream.

Video coding uses bitrate as the quantitative metric associated with encoded bitstream data size which expresses the rate at which video bitstream data can be transferred or processed, measured in number of bits per second. This bitrate is both a function of the encoded bitstream data size of the encoded pictures as well as the frame rate used by the video sequence.

Rate control algorithms are used by video encode operations to enable adjusting encoding parameters to achieve a target bitrate, or otherwise directly or indirectly control the bitrate of the generated video bitstream data. These algorithms are usually not defined by the used video compression standard, although some video compression standards do provide non-normative guidelines for implementations.

Accordingly, this specification does not mandate implementations to produce identical encoded bitstream data outputs in response to video encode operations, however, it does define a set of codec-independent and codec-specific parameters that enable the application to control the behavior of the rate control algorithms supported by the implementation. Some of these parameters guarantee certain implementation behavior while others provide guidance for implementations to apply various rate control heuristics.

Applications need to make sure that they configure rate control parameters appropriately and that they follow the promises made to the implementation through parameters providing guidance for the implementation’s rate control algorithms and heuristics in order to be able to get the desired rate control behavior and to be able to hit the set bitrate targets. In addition, the behavior of rate control may also differ across implementations even if the capabilities of the used video profile match between those implementations. This may happen due to implementations applying different rate control algorithms or heuristics internally, and thus even the same set of guidance parameter values may have different effects on the rate control behavior across implementations.

Rate Control Modes

After a video session is reset to the initial state, the default behavior and parameters of video encode rate control are entirely implementation-dependent and the application cannot affect the bitrate or quality parameters of the encoded bitstream data produced by video encode operations unless the application changes the rate control configuration of the video session, as described in the Video Coding Control section.

For each supported video profile, the implementation may expose a set of rate control modes that are available for use by the application when encoding bitstreams targeting that video profile. These modes allow using different rate control algorithms that fall into one of the following two categories:

Per-operation rate control
Stream-level rate control

In case of per-operation rate control, the bitrate of the generated video bitstream data is indirectly controlled by quality, size, or other encoding parameters specified by the application for each individual video encode operation.

In case of stream-level rate control, the application can directly specify target bitrates besides other encoding parameters to control the behavior of the rate control algorithm used by the implementation across multiple video encode operations.

Leaky Bucket Model

Video encoding implementations use the leaky bucket model for stream-level rate control. The leaky bucket is a concept referring to the interface between the video encoder and the consumer (for example, a network connection), where the video encoder produces encoded bitstream data corresponding to the encoded pictures and adds them in the leaky bucket while its content are drained by the consumer.

Analogously, a similar leaky bucket is considered to exist at the input interface of a video decoder, into which encoded bitstream data is continuously added and is subsequently consumed by the video decoder. It is desirable to avoid overflowing or underflowing this leaky bucked because:

In case of an underflow, the video decoder will be unable to consume encoded bitstream data in order to decode pictures (and optionally display them).
In case of an overflow, the leaky bucket will be unable to accommodate more encoded bitstream data and such data may need to be thrown away, leading to the loss of the corresponding encoded pictures.

These requirements can be satisfied by imposing various constraints on the encoder-side leaky bucket to avoid its overflow or underflow, depending on the used rate control algorithm and codec parameters. However, enumerating these constraints is outside the scope of this specification.

The term virtual buffer is often used as an alternative to refer to the leaky bucket.

This virtual buffer model is defined by the following parameters:

The bitrate (R) at which the encoded bitstream is expected to be processed.
The size (B) of the virtual buffer.
The initial occupancy (F) of the virtual buffer.

In this model the virtual buffer is used to smooth out fluctuations in the bitrate of the encoded bitstream over time without experiencing buffer overflow or underflow, as long as the bitrate of the encoded stream does not diverge from the target bitrate for extended periods of time.

This buffering may inherently impose a processing delay, as the goal of the model is to enable decoders maintain a consistent processing rate of an encoded bitstream with varying data rate.

The initial or start-up delay (D) is computed as:

D = F / R

Applications need to configure the virtual buffer with sufficient size to avoid or minimize buffer overflows and underflows while also keeping it small enough to meet their latency goals.

Rate Control Layers

Some video compression standards and video profiles allow associating encoded pictures with specific video coding layers. The name, identification, and semantics associated with such video coding layers are defined by the corresponding video compression standards.

Analogously, stream-level rate control can be configured to use one or more rate control layers:

When a single rate control layer is configured, it is applied to all encoded pictures, regardless of the picture’s video coding layer. In this case the distribution of the available bitrate budget across video coding layers is implementation-dependent.
When multiple rate control layers are configured, each rate control layer is applied to the corresponding video coding layer, i.e. only across encoded pictures pertaining to the corresponding video coding layer.

Individual rate control layers are identified using layer indices between zero and N-1, where N is the number of active rate control layers.

Rate control layers are only applicable when using stream-level rate control modes.

Rate Control State

Rate control state is maintained by the implementation in the video session objects and its parameters are specified using an instance of the VkVideoEncodeRateControlInfoKHR structure. The complete rate control state of a video session is defined by the following set of parameters:

The values of the members of the VkVideoEncodeRateControlInfoKHR structure used to configure the rate control state.
The values of the members of any VkVideoEncodeRateControlLayerInfoKHR structures specified in VkVideoEncodeRateControlInfoKHR::pLayers used to configure the state of individual rate control layers.
If the video session was created with an H.264 encode profile:
- The values of the members of the VkVideoEncodeH264RateControlInfoKHR structure, if one is specified in the pNext chain of the VkVideoEncodeRateControlInfoKHR used to configure the rate control state.
- The values of the members of any VkVideoEncodeH264RateControlLayerInfoKHR structures included in the pNext chain of a VkVideoEncodeRateControlLayerInfoKHR structure used to configure the state of a rate control layer.
If the video session was created with an H.265 encode profile:
- The values of the members of the VkVideoEncodeH265RateControlInfoKHR structure, if one is specified in the pNext chain of the VkVideoEncodeRateControlInfoKHR used to configure the rate control state.
- The values of the members of any VkVideoEncodeH265RateControlLayerInfoKHR structures included in the pNext chain of a VkVideoEncodeRateControlLayerInfoKHR structure used to configure the state of a rate control layer.
If the video session was created with an AV1 encode profile:
- The values of the members of the VkVideoEncodeAV1RateControlInfoKHR structure, if one is specified in the pNext chain of the VkVideoEncodeRateControlInfoKHR used to configure the rate control state.
- The values of the members of any VkVideoEncodeAV1RateControlLayerInfoKHR structures included in the pNext chain of a VkVideoEncodeRateControlLayerInfoKHR structure used to configure the state of a rate control layer.

Two rate control states match if all the parameters listed above match between them.

Rate Control Layer State

The configuration of individual rate control layers is specified using an instance of the VkVideoEncodeRateControlLayerInfoKHR structure.

Video Encode Quantization Maps

Quantization maps are VkImage objects that are used in video encode operations to control the relative quantization parameter values across the encoded picture. Each texel in the quantization map controls the relative quantization parameter values used to encode the corresponding rectangular block of texels in the encode input picture.

The size of the rectangular block of texels each quantization map texel covers is referred to as the quantization map texel size.

The extent of the image subresource used as a quantization map when encoding a picture with a coded extent of (width,height) thus has to be at least (⌈width / texelSize.width⌉, ⌈height / texelSize.height⌉), where texelSize is the used quantization map texel size.

In particular, the quantization map texel at location (x,y) contains relative quantization parameter values used when encoding the texelSize sized rectangular block of the encode input picture starting at the texel location (x × texelSize.width, y × texelSize.height).

The quantization map texel size does not always match the size of the codec-specific coding blocks used during encoding. Furthermore, some video compression standards allow the size of the codec-specific coding blocks to vary across the encoded picture. In order to accommodate for such mismatches between the granularity at which quantization parameters are stored in quantization maps and the granularity at which they are applied to codec-specific coding blocks during encoding, the following mapping rules are applied to define the quantization map texel value corresponding to a given codec-specific coding block with a size (width,height) at the texel location (x,y) in the encode input picture:

If the size of the codec-specific coding block matches the used quantization map texel size, then the fetched quantization map value corresponding to the codec-specific coding block is the texel value at the texel location (x / texelSize.width, y / texelSize.height).
If the size of the codec-specific coding block is smaller than the used quantization map texel size, then the fetched quantization map value corresponding to the codec-specific coding block is the texel value at the texel location (⌊x / texelSize.width⌋, ⌊y / texelSize.height⌋).
If the size of the codec-specific coding block is larger than the used quantization map texel size, then the fetched quantization map value corresponding to the codec-specific coding block may be any value determined as the linear interpolation of the quantization map texel values in the subregion starting at texel location (x / texelSize.width,y / texelSize.height) with a size (⌈width / texelSize.width⌉, ⌈height / texelSize.height⌉).

The actual control parameters stored in the quantization map depend on its type. This specification supports the following types of quantization maps:

Quantization Delta Maps

Quantization delta maps contain values that directly affect the codec-specific quantization parameter values used to encode the corresponding block of the encode input picture.

Quantization delta maps can be used in conjunction with any rate control mode, including VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR.

Due to their codec-specific nature, they are described in more detail in the corresponding codec-specific section for video encode operations that support them. In particular:

The behavior of quantization delta maps used with an H.264 encode profile is described in the H.264 Encode Quantization and H.264 QP Delta Maps sections.
The behavior of quantization delta maps used with an H.265 encode profile is described in the H.265 Encode Quantization and H.265 QP Delta Maps sections.
The behavior of quantization delta maps used with an AV1 encode profile is described in the AV1 Encode Quantization and AV1 Quantizer Index Delta Maps sections.

This specification does not support quantization delta maps for any other video encode operation.

Emphasis Maps

Emphasis maps contain values that indirectly affect the codec-specific quantization parameter values used to encode the corresponding block of the encode input picture.

The texels of emphasis maps contain values that provide input to the encoder implementation about the relative importance (emphasis) of regions of the encoded pictures in order to enable the implementation’s rate control algorithm to allocate more bitrate budget for regions of the encoded picture with higher emphasis values than to those with lower emphasis values.

Emphasis maps can only be used when the current rate control mode configured for the video session is not VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR or VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR.

As these emphasis values only control the otherwise implementation-specific behavior of the used rate control algorithm, this specification does not impose additional restrictions on implementations beyond the ones outlined in the corresponding codec-specific sections describing quantization behavior:

The behavior of emphasis maps used with an H.264 encode profile is described in the H.264 Encode Quantization section.
The behavior of emphasis maps used with an H.265 encode profile is described in the H.265 Encode Quantization section.
The behavior of emphasis maps used with an AV1 encode profile is described in the AV1 Encode Quantization section.

This specification does not support emphasis maps for any other video encode operation.

Emphasis maps always have single channel unsigned normalized integer formats and implementations are required to support the VK_FORMAT_R8_UNORM format for emphasis maps, as reported in VkVideoFormatPropertiesKHR::format, when the video encode profile supports VK_VIDEO_ENCODE_CAPABILITY_EMPHASIS_MAP_BIT_KHR.

Quantization Map Capabilities

Quantization Map Format Properties

Encoding with Quantization Maps

H.264 Encode Operations

Video encode operations using an H.264 encode profile can be used to encode elementary video stream sequences compliant to the ITU-T H.264 Specification.

Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.

This process is performed according to the video encode operation steps with the codec-specific semantics defined in section 8 of the ITU-T H.264 Specification as follows:

Syntax elements, derived values, and other parameters are applied from the following structures:
- The StdVideoH264SequenceParameterSet structure corresponding to the active SPS specifying the H.264 sequence parameter set.
- The StdVideoH264PictureParameterSet structure corresponding to the active PPS specifying the H.264 picture parameter set.
- The StdVideoEncodeH264PictureInfo structure specifying the H.264 picture information.
- The StdVideoEncodeH264SliceHeader structures specifying the H.264 slice header parameters for each encoded H.264 slice.
- The StdVideoEncodeH264ReferenceInfo structures specifying the H.264 reference information corresponding to the optional reconstructed picture and any active reference pictures.
The encoded bitstream data is written to the destination video bitstream buffer range as defined in the H.264 Encode Bitstream Data Access section.
Picture data in the video picture resources corresponding to the used encode input picture, active reference pictures, and optional reconstructed picture is accessed as defined in the H.264 Encode Picture Data Access section.
The decision on reference picture setup is made according to the parameters specified in the H.264 picture information.

If the parameters adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.264 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video encode operation will complete successfully. Otherwise, the video encode operation may complete unsuccessfully.

H.264 Encode Parameter Overrides

Implementations may override, unless otherwise specified, any of the H.264 encode parameters specified in the following Video Std structures:

StdVideoH264SequenceParameterSet
StdVideoH264PictureParameterSet
StdVideoEncodeH264PictureInfo
StdVideoEncodeH264SliceHeader
StdVideoEncodeH264ReferenceInfo

All such H.264 encode parameter overrides must fulfill the conditions defined in the Video Encode Parameter Overrides section.

In addition, implementations must not override any of the following H.264 encode parameters:

StdVideoEncodeH264PictureInfo::primary_pic_type
StdVideoEncodeH264SliceHeader::slice_type

If the videoMaintenance2 feature is enabled, implementations must not override any of the following H.264 encode parameters:

the following parameters specified in StdVideoH264SequenceParameterSet:
- flags.vui_parameters_present_flag
- profile_idc
- level_idc
- chroma_format_idc
the following parameters specified in the StdVideoH264SequenceParameterSetVui structure pointed to by StdVideoH264SequenceParameterSet::pSequenceParameterSetVui:
- flags.aspect_ratio_info_present_flag
- flags.overscan_info_present_flag
- flags.overscan_appropriate_flag
- flags.video_signal_type_present_flag
- flags.video_full_range_flag
- flags.color_description_present_flag
- flags.chroma_loc_info_present_flag
- flags.timing_info_present_flag
- flags.fixed_frame_rate_flag
- aspect_ratio_idc
- sar_width
- sar_height
- video_format
- colour_primaries
- transfer_characteristics
- matrix_coefficients
- num_units_in_tick
- time_scale
- chroma_sample_loc_type_top_field
- chroma_sample_loc_type_bottom_field

In case of H.264 encode parameters stored in video session parameters objects, applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened. If the query indicates that implementation overrides were applied, then the application needs to retrieve and use the encoded H.264 parameter sets in the bitstream in order to be able to produce a compliant H.264 video bitstream using the H.264 encode parameters stored in the video session parameters object.

In case of any H.264 encode parameters stored in the encoded bitstream produced by video encode operations, if the implementation supports the VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR

video encode feedback query flag, the application can use such queries to retrieve feedback about whether any implementation overrides have been applied to those H.264 encode parameters.

H.264 Encode Bitstream Data Access

Each video encode operation writes one or more VCL NAL units comprising of slice headers and data of the encoded picture, in the format defined in sections 7.3.3 and 7.3.4, according to the semantics defined in sections 7.4.3 and 7.4.4 of the ITU-T H.264 Specification, respectively. The number of VCL NAL units written is specified by VkVideoEncodeH264PictureInfoKHR::naluSliceEntryCount.

In addition, if VkVideoEncodeH264PictureInfoKHR::generatePrefixNalu is VK_TRUE for the video encode operation, then an additional prefix NAL unit is written before each VCL NAL unit corresponding to individual slices in the format defined in section 7.3.2.12, according to the semantics defined in section 7.4.2.12 of the ITU-T H.264 Specification, respectively.

H.264 Encode Picture Data Access

Accesses to image data within a video picture resource happen at the granularity indicated by VkVideoCapabilitiesKHR::pictureAccessGranularity, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile. Accordingly, the complete image subregion of a encode input picture, reference picture, or reconstructed picture accessed by video coding operations using an H.264 encode profile is defined as the set of texels within the coordinate range:

([0,endX), [0,endY))

Where:

endX equals codedExtent.width rounded up to the nearest integer multiple of pictureAccessGranularity.width and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
endY equals codedExtent.height rounded up to the nearest integer multiple of pictureAccessGranularity.height and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;

Where codedExtent is the member of the VkVideoPictureResourceInfoKHR structure corresponding to the picture.

In case of video encode operations using an H.264 encode profile, any access to a picture at the coordinates (x,y), as defined by the ITU-T H.264 Specification, is an access to the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure at the texel coordinates (x,y).

Implementations may choose not to access some or all texels within particular reference pictures available to a video encode operation (e.g. due to video encode parameter overrides restricting the effective set of used reference pictures, or if the encoding algorithm chooses not to use certain subregions of the reference picture data for sample prediction).

H.264 Frame, Picture, and Slice

H.264 pictures are partitioned into slices, as defined in section 6.3 of the ITU-T H.264 Specification.

For the purposes of this specification, the H.264 slices comprising a picture are referred to as the picture partitions of the picture.

Video encode operations using an H.264 encode profile can encode slices of different types, as defined in section 7.4.3 of the ITU-T H.264 Specification, by specifying the corresponding enumeration constant value in StdVideoEncodeH264SliceHeader::slice_type in the H.264 slice header parameters from the Video Std enumeration type StdVideoH264SliceType:

STD_VIDEO_H264_SLICE_TYPE_P indicates that the slice is a P slice as defined in section 3.109 of the ITU-T H.264 Specification.
STD_VIDEO_H264_SLICE_TYPE_B indicates that the slice is a B slice as defined in section 3.9 of the ITU-T H.264 Specification.
STD_VIDEO_H264_SLICE_TYPE_I indicates that the slice is an I slice as defined in section 3.66 of the ITU-T H.264 Specification.

Pictures constructed from such slices can be of different types, as defined in section 7.4.2.4 of the ITU-T H.264 Specification. Video encode operations using an H.264 encode profile can encode pictures of a specific type by specifying the corresponding enumeration constant value in StdVideoEncodeH264PictureInfo::primary_pic_type in the H.264 picture information from the Video Std enumeration type StdVideoH264PictureType:

STD_VIDEO_H264_PICTURE_TYPE_P indicates that the picture is a P picture. A frame consisting of a P picture is also referred to as a P frame.
STD_VIDEO_H264_PICTURE_TYPE_B indicates that the picture is a B picture. A frame consisting of a B picture is also referred to as a B frame.
STD_VIDEO_H264_PICTURE_TYPE_I indicates that the picture is an I picture. A frame consisting of an I picture is also referred to as an I frame.
STD_VIDEO_H264_PICTURE_TYPE_IDR indicates that the picture is a special type of I picture called an IDR picture as defined in section 3.69 of the ITU-T H.264 Specification. A frame consisting of an IDR picture is also referred to as an IDR frame.

H.264 Coding Blocks

H.264 encode supports a single type of coding block called a macroblock, as defined in section 3.84 of the ITU-T H.264 Specification.

H.264 Encode Profile

H.264 Encode Capabilities

H.264 Encode Quality Level Properties

H.264 Encode Session

Additional parameters can be specified when creating a video session with an H.264 encode profile by including an instance of the VkVideoEncodeH264SessionCreateInfoKHR structure in the pNext chain of VkVideoSessionCreateInfoKHR.

H.264 Encode Parameter Sets

Video session parameters objects created with the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHRcan contain the following types of parameters:

H.264 Sequence Parameter Sets (SPS)

Represented by StdVideoH264SequenceParameterSet structures and interpreted as follows:

reserved1 and reserved2 are used only for padding purposes and are otherwise ignored;
seq_parameter_set_id is used as the key of the SPS entry;
level_idc is one of the enum constants STD_VIDEO_H264_LEVEL_IDC_<major>_<minor> identifying the H.264 level <major>.<minor> as defined in section A.3 of the ITU-T H.264 Specification;
if flags.seq_scaling_matrix_present_flag is set, then the StdVideoH264ScalingLists structure pointed to by pScalingLists is interpreted as follows:
- scaling_list_present_mask is a bitmask where bit index i corresponds to seq_scaling_list_present_flag[i] as defined in section 7.4.2.1 of the ITU-T H.264 Specification;
- use_default_scaling_matrix_mask is a bitmask where bit index i corresponds to UseDefaultScalingMatrix4x4Flag[i], when i < 6, or corresponds to UseDefaultScalingMatrix8x8Flag[i-6], otherwise, as defined in section 7.3.2.1 of the ITU-T H.264 Specification;
- ScalingList4x4 and ScalingList8x8 correspond to the identically named syntax elements defined in section 7.3.2.1 of the ITU-T H.264 Specification;
if flags.vui_parameters_present_flag is set, then pSequenceParameterSetVui is a pointer to a StdVideoH264SequenceParameterSetVui structure that is interpreted as follows:
- reserved1 is used only for padding purposes and is otherwise ignored;
- flags.color_description_present_flag is interpreted as the value of colour_description_present_flag, as defined in section E.2.1 of the ITU-T H.264 Specification;
  The name of colour_description_present_flag was misspelled in the Video Std header.
- if flags.nal_hrd_parameters_present_flag or flags.vcl_hrd_parameters_present_flag is set, then the StdVideoH264HrdParameters structure pointed to by pHrdParameters is interpreted as follows:
  - reserved1 is used only for padding purposes and is otherwise ignored;
  - all other members of StdVideoH264HrdParameters are interpreted as defined in section E.2.2 of the ITU-T H.264 Specification;
- all other members of StdVideoH264SequenceParameterSetVui are interpreted as defined in section E.2.1 of the ITU-T H.264 Specification;
all other members of StdVideoH264SequenceParameterSet are interpreted as defined in section 7.4.2.1 of the ITU-T H.264 Specification.

H.264 Picture Parameter Sets (PPS)

Represented by StdVideoH264PictureParameterSet structures and interpreted as follows:

the pair constructed from seq_parameter_set_id and pic_parameter_set_id is used as the key of the PPS entry;
if flags.pic_scaling_matrix_present_flag is set, then the StdVideoH264ScalingLists structure pointed to by pScalingLists is interpreted as follows:
- scaling_list_present_mask is a bitmask where bit index i corresponds to pic_scaling_list_present_flag[i] as defined in section 7.4.2.2 of the ITU-T H.264 Specification;
- use_default_scaling_matrix_mask is a bitmask where bit index i corresponds to UseDefaultScalingMatrix4x4Flag[i], when i < 6, or corresponds to UseDefaultScalingMatrix8x8Flag[i-6], otherwise, as defined in section 7.3.2.2 of the ITU-T H.264 Specification;
- ScalingList4x4 and ScalingList8x8 correspond to the identically named syntax elements defined in section 7.3.2.2 of the ITU-T H.264 Specification;
all other members of StdVideoH264PictureParameterSet are interpreted as defined in section 7.4.2.2 of the ITU-T H.264 Specification.

Implementations may override any of these parameters according to the semantics defined in the Video Encode Parameter Overrides section before storing the resulting H.264 parameter sets into the video session parameters object. Applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened and to retrieve the encoded H.264 parameter sets in order to be able to produce a compliant H.264 video bitstream.

Such H.264 parameter set overrides may also have cascading effects on the implementation overrides applied to the encoded bitstream produced by video encode operations. If the implementation supports the VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR

video encode feedback query flag, then the application can use such queries to retrieve feedback about whether any implementation overrides have been applied to the encoded bitstream.

H.264 Encoding Parameters

H.264 Encode Rate Control

Group of Pictures

In case of H.264 encoding it is common practice to follow a regular pattern of different picture types in display order when encoding subsequent frames. This pattern is referred to as the group of pictures (GOP).

A regular GOP is defined by the following parameters:

The number of frames in the GOP;
The number of consecutive B frames between I and/or P frames in display order.

GOPs are further classified as open and closed GOPs.

Frame types in an open GOP follow each other in display order according to the following algorithm:

The first frame is always an I frame.
This is followed by a number of consecutive B frames, as defined above.
If the number of frames in the GOP is not reached yet, then the next frame is a P frame and the algorithm continues from step 2.

In case of a closed GOP, an IDR frame is used at a certain period.

It is also typical for H.264 encoding to use specific reference picture usage patterns across the frames of the GOP. The two most common reference patterns used are as follows:

Flat Reference Pattern

Each P frame uses the last non-B frame, in display order, as reference.
Each B frame uses the last non-B frame, in display order, as its forward reference, and uses the next non-B frame, in display order, as its backward reference.

Dyadic Reference Pattern

Each P frame uses the last non-B frame, in display order, as reference.
The following algorithm is applied to the sequence of consecutive B frames between I and/or P frames in display order:
1. The B frame in the middle of this sequence uses the frame preceding the sequence as its forward reference, and uses the frame following the sequence as its backward reference.
2. The algorithm is executed recursively for the following frame sequences:
  - The B frames of the original sequence preceding the frame in the middle, if any.
  - The B frames of the original sequence following the frame in the middle, if any.

The application can provide guidance to the implementation’s rate control algorithm about the structure of the GOP used by the application. Any such guidance about the GOP and its structure does not mandate that specific GOP structure to be used by the application, as the picture type of individual encoded pictures is still application-controlled, however, any deviation from the provided guidance may result in undesired rate control behavior including, but not limited, to the implementation not being able to conform to the expected average or target bitrates, or other rate control parameters specified by the application.

When an H.264 encode session is used to encode multiple temporal layers, it is also common practice to follow a regular pattern for the H.264 temporal ID for the encoded pictures in display order when encoding subsequent frames. This pattern is referred to as the temporal GOP. The most common temporal layer pattern used is as follows:

Dyadic Temporal Layer Pattern

The number of frames in the temporal GOP is 2^n-1, where n is the number of temporal layers.
The i^th frame in the temporal GOP uses temporal ID t, if and only if the index of the least significant bit set in i equals n-t-1, except for the first frame, which is the only frame in the temporal GOP using temporal ID zero.
The i^th frame in the temporal GOP uses the r^th frame as reference, where r is calculated from i by clearing the least significant bit set in it, except for the first frame in the temporal GOP, which uses the first frame of the previous temporal GOP, if any, as reference.

Multi-layer rate control and multi-layer coding are typically used for streaming cases where low latency is expected, hence B pictures with backward prediction are usually not used.

Rate Control Layers

GOP Remaining Frames

Besides session level rate control configuration, the application can specify the number of frames per frame type remaining in the group of pictures (GOP).

H.264 QP Delta Maps

Quantization delta maps used with an H.264 encode profile are referred to as QP delta maps and their texels contain integer values representing QP delta values that are applied in the process of determining the quantization parameters of the encoded picture.

Accordingly, H.264 QP delta maps always have single channel integer formats, as reported in VkVideoFormatPropertiesKHR::format.

When the rate control mode is VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR, the QP delta values are added to the per slice constant QP values that, in effect, enable the application to explicitly control the used QP values at the granularity of the used quantization map texel size.

For all other rate control modes, the QP delta values can be used to offset the QP values that the rate control algorithm would otherwise produce.

H.264 Encode Quantization

Performing H.264 encode operations involves the process of assigning QP values to individual H.264 macroblocks. This process depends on the used rate control mode, as well as other encode and rate control parameters, as described below:

If the configured rate control mode is VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR, then the QP value is initialized by the implementation-specific default rate control algorithm.
- If the video encode operation is issued with a quantization delta map, the QP delta value corresponding to the macroblock, as fetched from the quantization map, is added to the previously determined QP value. If the fetched QP delta value falls outside the supported QP delta value range reported in the minQpDelta and maxQpDelta members of VkVideoEncodeH264QuantizationMapCapabilitiesKHR, then the QP value used for the macroblock becomes undefined:.
If the configured rate control mode is VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR, then the QP value is initialized from the constant QP value specified for the H.264 slice the macroblock is part of.
- If the video encode operation is issued with a quantization delta map, the QP delta value corresponding to the macroblock, as fetched from the quantization map, is added to the previously determined QP value. If the fetched QP delta value falls outside the supported QP delta value range reported in the minQpDelta and maxQpDelta members of VkVideoEncodeH264QuantizationMapCapabilitiesKHR, then the QP value used for the macroblock becomes undefined:.
If the configured rate control mode is not VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR or VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR, then the QP value is initialized by the corresponding rate control algorithm.
- If the video encode operation is issued with a quantization delta map, the QP delta value corresponding to the macroblock, as fetched from the quantization map, is added to the previously determined QP value. If the fetched QP delta value falls outside the supported QP delta value range reported in the minQpDelta and maxQpDelta members of VkVideoEncodeH264QuantizationMapCapabilitiesKHR, then the QP value used for the macroblock becomes undefined:.
- If the video encode operation is issued with an emphasis map, the rate control will adjust the QP value based on the emphasis value corresponding to the macroblock, as fetched from the quantization map, according to the following equation:
  QP_new = f(QP_prev,e)
  Where QP_new is the resulting QP value, QP_prev is the previously determined QP value, e is the emphasis value corresponding to the macroblock, and f is an implementation-defined function for which the following implication is true:
  e₁ < e₂ ⇒ f(QP,e₁) ≥ f(QP,e₂)
  This means that lower emphasis values will result in higher QP values, whereas higher emphasis values will result in lower QP values, but the function is not strictly decreasing with respect to the input emphasis value for a given input QP value.
- If clamping to minimum QP values is enabled in the applied rate control layer, then the QP value is clamped to the corresponding minimum QP value.
- If clamping to maximum QP values is enabled in the applied rate control layer, then the QP value is clamped to the corresponding maximum QP value.
If VK_VIDEO_ENCODE_H264_CAPABILITY_MB_QP_DIFF_WRAPAROUND_BIT_KHR is not supported, then the determined QP value is clamped in such a way that the mb_qp_delta value of the encoded macroblock complies to the modified version of equation 7-37 of the ITU-T H.264 Specification.
The effect of this is that the maximum QP difference across subsequent macroblocks is limited to the [-(26 + QpBdOffset_Y / 2), 25 + QpBdOffset_Y / 2] range and only has an observable change in behavior when the video encode operation is issued with a QP delta map.
In all cases, the final QP value is clamped to the QP value range supported by the video profile, as reported in the minQp and maxQp members of VkVideoEncodeH264CapabilitiesKHR.

H.264 Encode Requirements

This section described the required H.264 encoding capabilities for physical devices that have at least one queue family that supports the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 in VkQueueFamilyVideoPropertiesKHR::videoCodecOperations.

Table 59. Required Video Std Header Versions
Video Std Header Name	Version
vulkan_video_codec_h264std_encode	1.0.0

Table 60. Required Video Capabilities
Video Capability	Requirement	Requirement Type1
VkVideoCapabilitiesKHR
flags	-	min
minBitstreamBufferOffsetAlignment	4096	max
minBitstreamBufferSizeAlignment	4096	max
pictureAccessGranularity	(64,64)	max
minCodedExtent	-	max
maxCodedExtent	-	min
maxDpbSlots	0	min
maxActiveReferencePictures	0	min
VkVideoEncodeCapabilitiesKHR
flags	-	min
rateControlModes	VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR 4	min
maxBitrate	64000	min
maxQualityLevels	1	min
encodeInputPictureGranularity	(64,64)	max
supportedEncodeFeedbackFlags	VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BUFFER_OFFSET_BIT_KHR VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BYTES_WRITTEN_BIT_KHR	min
VkVideoEncodeH264CapabilitiesKHR
flags	-	min
maxLevelIdc	STD_VIDEO_H264_LEVEL_IDC_1_0	min
maxSliceCount	1	min
maxPPictureL0ReferenceCount	0	min
maxBPictureL0ReferenceCount	0	min
maxL1ReferenceCount	0	min
maxTemporalLayerCount	1	min
expectDyadicTemporalLayerPattern	-	implementation-dependent
minQp	-	max
maxQp	-	min
prefersGopRemainingFrames	-	implementation-dependent
requiresGopRemainingFrames	-	implementation-dependent
stdSyntaxFlags	-	min
VkVideoEncodeQuantizationMapCapabilitiesKHR
maxQuantizationMapExtent	- 2	min
VkVideoEncodeH264QuantizationMapCapabilitiesKHR
minQpDelta	- 3	max
maxQpDelta	- 3	min

H.265 Encode Operations

Video encode operations using an H.265 encode profile can be used to encode elementary video stream sequences compliant to the ITU-T H.265 Specification.

Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.

This process is performed according to the video encode operation steps with the codec-specific semantics defined in section 8 of the ITU-T H.265 Specification as follows:

Syntax elements, derived values, and other parameters are applied from the following structures:
- The StdVideoH265VideoParameterSet structure corresponding to the active VPS specifying the H.265 video parameter set.
- The StdVideoH265SequenceParameterSet structure corresponding to the active SPS specifying the H.265 sequence parameter set.
- The StdVideoH265PictureParameterSet structure corresponding to the active PPS specifying the H.265 picture parameter set.
- The StdVideoEncodeH265PictureInfo structure specifying the H.265 picture information.
- The StdVideoEncodeH265SliceSegmentHeader structures specifying the H.265 slice segment header parameters for each encoded H.265 slice segment.
- The StdVideoEncodeH265ReferenceInfo structures specifying the H.265 reference information corresponding to the optional reconstructed picture and any active reference pictures.
The encoded bitstream data is written to the destination video bitstream buffer range as defined in the H.265 Encode Bitstream Data Access section.
Picture data in the video picture resources corresponding to the used encode input picture, active reference pictures, and optional reconstructed picture is accessed as defined in the H.265 Encode Picture Data Access section.
The decision on reference picture setup is made according to the parameters specified in the H.265 picture information.

If the parameters adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.265 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video encode operation will complete successfully. Otherwise, the video encode operation may complete unsuccessfully.

H.265 Encode Parameter Overrides

Implementations may override, unless otherwise specified, any of the H.265 encode parameters specified in the following Video Std structures:

StdVideoH265VideoParameterSet
StdVideoH265SequenceParameterSet
StdVideoH265PictureParameterSet
StdVideoEncodeH265PictureInfo
StdVideoEncodeH265SliceSegmentHeader
StdVideoEncodeH265ReferenceInfo

All such H.265 encode parameter overrides must fulfill the conditions defined in the Video Encode Parameter Overrides section.

In addition, implementations must not override any of the following H.265 encode parameters:

StdVideoEncodeH265PictureInfo::pic_type
StdVideoEncodeH265SliceSegmentHeader::slice_type

If the videoMaintenance2 feature is enabled, implementations must not override any of the following H.265 encode parameters:

the following parameters specified in the StdVideoH265ProfileTierLevel structure pointed to by StdVideoH265VideoParameterSet::pProfileTierLevel and StdVideoH265SequenceParameterSet::pProfileTierLevel:
- flags.general_tier_flag
- flags.general_progressive_source_flag, if set to 1
- flags.general_interlaced_source_flag, if set to 0
- flags.general_frame_only_constraint_flag
- general_profile_idc
- general_level_idc
the following parameters specified in StdVideoH265SequenceParameterSet:
- chroma_format_idc
the following parameters specified in the StdVideoH265SequenceParameterSetVui structure pointed to by StdVideoH265SequenceParameterSet::pSequenceParameterSetVui:
- flags.aspect_ratio_info_present_flag
- flags.overscan_info_present_flag
- flags.overscan_appropriate_flag
- flags.video_signal_type_present_flag
- flags.video_full_range_flag
- flags.colour_description_present_flag
- flags.chroma_loc_info_present_flag
- flags.neutral_chroma_indication_flag
- flags.field_seq_flag
- flags.frame_field_info_present_flag
- flags.default_display_window_flag
- flags.vui_timing_info_present_flag
- flags.vui_poc_proportional_to_timing_flag
- aspect_ratio_idc
- sar_width
- sar_height
- video_format
- colour_primaries
- transfer_characteristics
- matrix_coeffs
- chroma_sample_loc_type_top_field
- chroma_sample_loc_type_bottom_field
- def_disp_win_left_offset
- def_disp_win_right_offset
- def_disp_win_top_offset
- def_disp_win_bottom_offset
- vui_num_units_in_tick
- vui_time_scale
- vui_num_ticks_poc_diff_one_minus1

In case of a video session parameters object created with VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR, the following H.265 SPS and PPS parameters may be overridden by the implementation according to the quantization map texel size the video session parameters object was created with:

StdVideoH265SequenceParameterSet::log2_min_luma_coding_block_size_minus3
StdVideoH265SequenceParameterSet::log2_diff_max_min_luma_coding_block_size
StdVideoH265SequenceParameterSet::log2_min_pcm_luma_coding_block_size_minus3
StdVideoH265SequenceParameterSet::log2_diff_max_min_pcm_luma_coding_block_size
StdVideoH265PictureParameterSet::diff_cu_qp_delta_depth

This may be necessary in order to limit the set of H.265 coding unit and coding tree unit sizes used during picture encoding to those that are supported by the implementation when using the specific quantization map texel size.

In case of H.265 encode parameters stored in video session parameters objects, applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened. If the query indicates that implementation overrides were applied, then the application needs to retrieve and use the encoded H.265 parameter sets in the bitstream in order to be able to produce a compliant H.265 video bitstream using the H.265 encode parameters stored in the video session parameters object.

In case of any H.265 encode parameters stored in the encoded bitstream produced by video encode operations, if the implementation supports the VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR

video encode feedback query flag, the application can use such queries to retrieve feedback about whether any implementation overrides have been applied to those H.265 encode parameters.

H.265 Encode Bitstream Data Access

Each video encode operation writes one or more VCL NAL units comprising of slice segment headers and data of the encoded picture, in the format defined in sections 7.3.6 and 7.3.8, according to the semantics defined in sections 7.4.7 and 7.4.9 of the ITU-T H.265 Specification, respectively. The number of VCL NAL units written is specified by VkVideoEncodeH265PictureInfoKHR::naluSliceSegmentEntryCount.

H.265 Encode Picture Data Access

Accesses to image data within a video picture resource happen at the granularity indicated by VkVideoCapabilitiesKHR::pictureAccessGranularity, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile. Accordingly, the complete image subregion of a encode input picture, reference picture, or reconstructed picture accessed by video coding operations using an H.265 encode profile is defined as the set of texels within the coordinate range:

([0,endX), [0,endY))

Where:

endX equals codedExtent.width rounded up to the nearest integer multiple of pictureAccessGranularity.width and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
endY equals codedExtent.height rounded up to the nearest integer multiple of pictureAccessGranularity.height and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;

Where codedExtent is the member of the VkVideoPictureResourceInfoKHR structure corresponding to the picture.

In case of video encode operations using an H.265 encode profile, any access to a picture at the coordinates (x,y), as defined by the ITU-T H.265 Specification, is an access to the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure at the texel coordinates (x,y).

H.265 Frame, Picture, Slice Segments, and Tiles

H.265 pictures consist of one or more slices, slice segments, and tiles, as defined in section 6.3.1 of the ITU-T H.265 Specification.

For the purposes of this specification, the H.265 slice segments comprising a picture are referred to as the picture partitions of the picture.

Video encode operations using an H.265 encode profile can encode slice segments of different types, as defined in section 7.4.7.1 of the ITU-T H.265 Specification, by specifying the corresponding enumeration constant value in StdVideoEncodeH265SliceSegmentHeader::slice_type in the H.265 slice segment header parameters from the Video Std enumeration type StdVideoH265SliceType:

STD_VIDEO_H265_SLICE_TYPE_B indicates that the slice segment is part of a B slice as defined in section 3.12 of the ITU-T H.265 Specification.
STD_VIDEO_H265_SLICE_TYPE_P indicates that the slice segment is part of a P slice as defined in section 3.111 of the ITU-T H.265 Specification.
STD_VIDEO_H265_SLICE_TYPE_I indicates that the slice segment is part of an I slice as defined in section 3.74 of the ITU-T H.265 Specification.

Pictures constructed from such slice segments can be of different types, as defined in section 7.4.3.5 of the ITU-T H.265 Specification. Video encode operations using an H.265 encode profile can encode pictures of a specific type by specifying the corresponding enumeration constant value in StdVideoEncodeH265PictureInfo::pic_type in the H.265 picture information from the Video Std enumeration type StdVideoH265PictureType:

STD_VIDEO_H265_PICTURE_TYPE_P indicates that the picture is a P picture. A frame consisting of a P picture is also referred to as a P frame.
STD_VIDEO_H265_PICTURE_TYPE_B indicates that the picture is a B picture. A frame consisting of a B picture is also referred to as a B frame.
STD_VIDEO_H265_PICTURE_TYPE_I indicates that the picture is an I picture. A frame consisting of an I picture is also referred to as an I frame.
STD_VIDEO_H265_PICTURE_TYPE_IDR indicates that the picture is a special type of I picture called an IDR picture as defined in section 3.67 of the ITU-T H.265 Specification. A frame consisting of an IDR picture is also referred to as an IDR frame.

H.265 Coding Blocks

H.265 encode supports two types of coding blocks:

Coding tree unit, as defined in section 3.35 of the ITU-T H.265 Specification.
Coding unit, as defined in section 3.36 of the ITU-T H.265 Specification.

H.265 Encode Profile

H.265 Encode Capabilities

H.265 Encode Quality Level Properties

H.265 Encode Session

Additional parameters can be specified when creating a video session with an H.265 encode profile by including an instance of the VkVideoEncodeH265SessionCreateInfoKHR structure in the pNext chain of VkVideoSessionCreateInfoKHR.

H.265 Encode Parameter Sets

Video session parameters objects created with the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHRcan contain the following types of parameters:

H.265 Video Parameter Sets (VPS)

Represented by StdVideoH265VideoParameterSet structures and interpreted as follows:

reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
vps_video_parameter_set_id is used as the key of the VPS entry;
the max_latency_increase_plus1, max_dec_pic_buffering_minus1, and max_num_reorder_pics members of the StdVideoH265DecPicBufMgr structure pointed to by pDecPicBufMgr correspond to vps_max_latency_increase_plus1, vps_max_dec_pic_buffering_minus1, and vps_max_num_reorder_pics, respectively, as defined in section 7.4.3.1 of the ITU-T H.265 Specification;
the StdVideoH265HrdParameters structure pointed to by pHrdParameters is interpreted as follows:
- reserved is used only for padding purposes and is otherwise ignored;
- flags.fixed_pic_rate_general_flag is a bitmask where bit index i corresponds to fixed_pic_rate_general_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
- flags.fixed_pic_rate_within_cvs_flag is a bitmask where bit index i corresponds to fixed_pic_rate_within_cvs_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
- flags.low_delay_hrd_flag is a bitmask where bit index i corresponds to low_delay_hrd_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
- if flags.nal_hrd_parameters_present_flag is set, then pSubLayerHrdParametersNal is a pointer to an array of vps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where vps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265VideoParameterSet structure and each element is interpreted as follows:
  - cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
  - all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
- if flags.vcl_hrd_parameters_present_flag is set, then pSubLayerHrdParametersVcl is a pointer to an array of vps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where vps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265VideoParameterSet structure and each element is interpreted as follows:
  - cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
  - all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
- all other members of StdVideoH265HrdParameters are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
the StdVideoH265ProfileTierLevel structure pointed to by pProfileTierLevel are interpreted as follows:
- general_level_idc is one of the enum constants STD_VIDEO_H265_LEVEL_IDC_<major>_<minor> identifying the H.265 level <major>.<minor> as defined in section A.4 of the ITU-T H.265 Specification;
- all other members of StdVideoH265ProfileTierLevel are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
all other members of StdVideoH265VideoParameterSet are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.

H.265 Sequence Parameter Sets (SPS)

Represented by StdVideoH265SequenceParameterSet structures and interpreted as follows:

reserved1 and reserved2 are used only for padding purposes and are otherwise ignored;
the pair constructed from sps_video_parameter_set_id and sps_seq_parameter_set_id is used as the key of the SPS entry;
the StdVideoH265ProfileTierLevel structure pointed to by pProfileTierLevel are interpreted as follows:
- general_level_idc is one of the enum constants STD_VIDEO_H265_LEVEL_IDC_<major>_<minor> identifying the H.265 level <major>.<minor> as defined in section A.4 of the ITU-T H.265 Specification;
- all other members of StdVideoH265ProfileTierLevel are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
the max_latency_increase_plus1, max_dec_pic_buffering_minus1, and max_num_reorder_pics members of the StdVideoH265DecPicBufMgr structure pointed to by pDecPicBufMgr correspond to sps_max_latency_increase_plus1, sps_max_dec_pic_buffering_minus1, and sps_max_num_reorder_pics, respectively, as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
if flags.sps_scaling_list_data_present_flag is set, then the StdVideoH265ScalingLists structure pointed to by pScalingLists is interpreted as follows:
- ScalingList4x4, ScalingList8x8, ScalingList16x16, and ScalingList32x32 correspond to ScalingList[0], ScalingList[1], ScalingList[2], and ScalingList[3], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
- ScalingListDCCoef16x16 and ScalingListDCCoef32x32 correspond to scaling_list_dc_coef_minus8[0] and scaling_list_dc_coef_minus8[1], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
pShortTermRefPicSet is a pointer to an array of num_short_term_ref_pic_sets number of StdVideoH265ShortTermRefPicSet structures where each element is interpreted as follows:
- reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
- used_by_curr_pic_flag is a bitmask where bit index i corresponds to used_by_curr_pic_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
- use_delta_flag is a bitmask where bit index i corresponds to use_delta_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
- used_by_curr_pic_s0_flag is a bitmask where bit index i corresponds to used_by_curr_pic_s0_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
- used_by_curr_pic_s1_flag is a bitmask where bit index i corresponds to used_by_curr_pic_s1_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
- all other members of StdVideoH265ShortTermRefPicSet are interpreted as defined in section 7.4.8 of the ITU-T H.265 Specification;
if flags.long_term_ref_pics_present_flag is set then the StdVideoH265LongTermRefPicsSps structure pointed to by pLongTermRefPicsSps is interpreted as follows:
- used_by_curr_pic_lt_sps_flag is a bitmask where bit index i corresponds to used_by_curr_pic_lt_sps_flag[i] as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
- all other members of StdVideoH265LongTermRefPicsSps are interpreted as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
if flags.vui_parameters_present_flag is set, then the StdVideoH265SequenceParameterSetVui structure pointed to by pSequenceParameterSetVui is interpreted as follows:
- reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
- the StdVideoH265HrdParameters structure pointed to by pHrdParameters is interpreted as follows:
  - flags.fixed_pic_rate_general_flag is a bitmask where bit index i corresponds to fixed_pic_rate_general_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
  - flags.fixed_pic_rate_within_cvs_flag is a bitmask where bit index i corresponds to fixed_pic_rate_within_cvs_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
  - flags.low_delay_hrd_flag is a bitmask where bit index i corresponds to low_delay_hrd_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
  - if flags.nal_hrd_parameters_present_flag is set, then pSubLayerHrdParametersNal is a pointer to an array of sps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where sps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265SequenceParameterSet structure and each element is interpreted as follows:
    - cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
    - all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
  - if flags.vcl_hrd_parameters_present_flag is set, then pSubLayerHrdParametersVcl is a pointer to an array of sps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where sps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265SequenceParameterSet structure and each element is interpreted as follows:
    - cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
    - all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
  - all other members of StdVideoH265HrdParameters are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
- all other members of pSequenceParameterSetVui are interpreted as defined in section E.3.1 of the ITU-T H.265 Specification;
if flags.sps_palette_predictor_initializer_present_flag is set, then the PredictorPaletteEntries member of the StdVideoH265PredictorPaletteEntries structure pointed to by pPredictorPaletteEntries is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification;
all other members of StdVideoH265SequenceParameterSet are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.

H.265 Picture Parameter Sets (PPS)

Represented by StdVideoH265PictureParameterSet structures and interpreted as follows:

reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
the triplet constructed from sps_video_parameter_set_id, pps_seq_parameter_set_id, and pps_pic_parameter_set_id is used as the key of the PPS entry;
if flags.pps_scaling_list_data_present_flag is set, then the StdVideoH265ScalingLists structure pointed to by pScalingLists is interpreted as follows:
- ScalingList4x4, ScalingList8x8, ScalingList16x16, and ScalingList32x32 correspond to ScalingList[0], ScalingList[1], ScalingList[2], and ScalingList[3], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
- ScalingListDCCoef16x16 and ScalingListDCCoef32x32 correspond to scaling_list_dc_coef_minus8[0] and scaling_list_dc_coef_minus8[1], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
if flags.pps_palette_predictor_initializer_present_flag is set, then the PredictorPaletteEntries member of the StdVideoH265PredictorPaletteEntries structure pointed to by pPredictorPaletteEntries is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification;
all other members of StdVideoH265PictureParameterSet are interpreted as defined in section 7.4.3.3 of the ITU-T H.265 Specification.

Implementations may override any of these parameters according to the semantics defined in the Video Encode Parameter Overrides section before storing the resulting H.265 parameter sets into the video session parameters object. Applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened and to retrieve the encoded H.265 parameter sets in order to be able to produce a compliant H.265 video bitstream.

Such H.265 parameter set overrides may also have cascading effects on the implementation overrides applied to the encoded bitstream produced by video encode operations. If the implementation supports the VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR

video encode feedback query flag, then the application can use such queries to retrieve feedback about whether any implementation overrides have been applied to the encoded bitstream.

H.265 Encoding Parameters

H.265 Encode Rate Control

Group of Pictures

In case of H.265 encoding it is common practice to follow a regular pattern of different picture types in display order when encoding subsequent frames. This pattern is referred to as the group of pictures (GOP).

A regular GOP is defined by the following parameters:

The number of frames in the GOP;
The number of consecutive B frames between I and/or P frames in display order.

GOPs are further classified as open and closed GOPs.

Frame types in an open GOP follow each other in display order according to the following algorithm:

The first frame is always an I frame.
This is followed by a number of consecutive B frames, as defined above.
If the number of frames in the GOP is not reached yet, then the next frame is a P frame and the algorithm continues from step 2.

In case of a closed GOP, an IDR frame is used at a certain period.

It is also typical for H.265 encoding to use specific reference picture usage patterns across the frames of the GOP. The two most common reference patterns used are as follows:

Flat Reference Pattern

Each P frame uses the last non-B frame, in display order, as reference.
Each B frame uses the last non-B frame, in display order, as its forward reference, and uses the next non-B frame, in display order, as its backward reference.

Dyadic Reference Pattern

Each P frame uses the last non-B frame, in display order, as reference.
The following algorithm is applied to the sequence of consecutive B frames between I and/or P frames in display order:
1. The B frame in the middle of this sequence uses the frame preceding the sequence as its forward reference, and uses the frame following the sequence as its backward reference.
2. The algorithm is executed recursively for the following frame sequences:
  - The B frames of the original sequence preceding the frame in the middle, if any.
  - The B frames of the original sequence following the frame in the middle, if any.

When an H.265 encode session is used to encode multiple temporal sub-layers, it is also common practice to follow a regular pattern for the H.265 temporal ID for the encoded pictures in display order when encoding subsequent frames. This pattern is referred to as the temporal GOP. The most common temporal layer pattern used is as follows:

Dyadic Temporal Sub-Layer Pattern

The number of frames in the temporal GOP is 2^n-1, where n is the number of temporal sub-layers.
The i^th frame in the temporal GOP uses temporal ID t, if and only if the index of the least significant bit set in i equals n-t-1, except for the first frame, which is the only frame in the temporal GOP using temporal ID zero.
The i^th frame in the temporal GOP uses the r^th frame as reference, where r is calculated from i by clearing the least significant bit set in it, except for the first frame in the temporal GOP, which uses the first frame of the previous temporal GOP, if any, as reference.

Multi-layer rate control and multi-layer coding are typically used for streaming cases where low latency is expected, hence B pictures with backward prediction are usually not used.

Rate Control Layers

GOP Remaining Frames

Besides session level rate control configuration, the application can specify the number of frames per frame type remaining in the group of pictures (GOP).

H.265 QP Delta Maps

Quantization delta maps used with an H.265 encode profile are referred to as QP delta maps and their texels contain integer values representing QP delta values that are applied in the process of determining the quantization parameters of the encoded picture.

Accordingly, H.265 QP delta maps always have single channel integer formats, as reported in VkVideoFormatPropertiesKHR::format.

When the rate control mode is VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR, the QP delta values are added to the per slice segment constant QP values that, in effect, enable the application to explicitly control the used QP values at the granularity of the used quantization map texel size.

For all other rate control modes, the QP delta values can be used to offset the QP values that the rate control algorithm would otherwise produce.

H.265 Encode Quantization

Performing H.265 encode operations involves the process of assigning QP values to individual H.265 coding units. This process depends on the used rate control mode, as well as other encode and rate control parameters, as described below:

If the configured rate control mode is VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR, then the QP value is initialized by the implementation-specific default rate control algorithm.
- If the video encode operation is issued with a quantization delta map, the QP delta value corresponding to the coding unit, as fetched from the quantization map, is added to the previously determined QP value. If the fetched QP delta value falls outside the supported QP delta value range reported in the minQpDelta and maxQpDelta members of VkVideoEncodeH265QuantizationMapCapabilitiesKHR, then the QP value used for the coding unit becomes undefined:.
If the configured rate control mode is VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR, then the QP value is initialized from the constant QP value specified for the H.265 slice segment the coding unit is part of.
- If the video encode operation is issued with a quantization delta map, the QP delta value corresponding to the coding unit, as fetched from the quantization map, is added to the previously determined QP value. If the fetched QP delta value falls outside the supported QP delta value range reported in the minQpDelta and maxQpDelta members of VkVideoEncodeH265QuantizationMapCapabilitiesKHR, then the QP value used for the coding unit becomes undefined:.
If the configured rate control mode is not VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR or VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR, then the QP value is initialized by the corresponding rate control algorithm.
- If the video encode operation is issued with a quantization delta map, the QP delta value corresponding to the coding unit, as fetched from the quantization map, is added to the previously determined QP value. If the fetched QP delta value falls outside the supported QP delta value range reported in the minQpDelta and maxQpDelta members of VkVideoEncodeH265QuantizationMapCapabilitiesKHR, then the QP value used for the coding unit becomes undefined:.
- If the video encode operation is issued with an emphasis map, the rate control will adjust the QP value based on the emphasis value corresponding to the coding unit, as fetched from the quantization map, according to the following equation:
  QP_new = f(QP_prev,e)
  Where QP_new is the resulting QP value, QP_prev is the previously determined QP value, e is the emphasis value corresponding to the coding unit, and f is an implementation-defined function for which the following implication is true:
  e₁ < e₂ ⇒ f(QP,e₁) ≥ f(QP,e₂)
  This means that lower emphasis values will result in higher QP values, whereas higher emphasis values will result in lower QP values, but the function is not strictly decreasing with respect to the input emphasis value for a given input QP value.
- If clamping to minimum QP values is enabled in the applied rate control layer, then the QP value is clamped to the corresponding minimum QP value.
- If clamping to maximum QP values is enabled in the applied rate control layer, then the QP value is clamped to the corresponding maximum QP value.
If VK_VIDEO_ENCODE_H265_CAPABILITY_CU_QP_DIFF_WRAPAROUND_BIT_KHR is not supported, then the determined QP value is clamped in such a way that the CuQpDeltaVal value of the encoded coding unit complies to the modified version of equation 8-283 of the ITU-T H.265 Specification.
The effect of this is that the maximum QP difference across subsequent coding units is limited to the [-(26 + QpBdOffset_Y / 2), 25 + QpBdOffset_Y / 2] range and only has an observable change in behavior when the video encode operation is issued with a QP delta map.
In all cases, the final QP value is clamped to the QP value range supported by the video profile, as reported in the minQp and maxQp members of VkVideoEncodeH265CapabilitiesKHR.

H.265 Encode Requirements

This section described the required H.265 encoding capabilities for physical devices that have at least one queue family that supports the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 in VkQueueFamilyVideoPropertiesKHR::videoCodecOperations.

Table 61. Required Video Std Header Versions
Video Std Header Name	Version
vulkan_video_codec_h265std_encode	1.0.0

Table 62. Required Video Capabilities
Video Capability	Requirement	Requirement Type1
VkVideoCapabilitiesKHR
flags	-	min
minBitstreamBufferOffsetAlignment	4096	max
minBitstreamBufferSizeAlignment	4096	max
pictureAccessGranularity	(64,64)	max
minCodedExtent	-	max
maxCodedExtent	-	min
maxDpbSlots	0	min
maxActiveReferencePictures	0	min
VkVideoEncodeCapabilitiesKHR
flags	-	min
rateControlModes	VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR 4	min
maxBitrate	128000	min
maxQualityLevels	1	min
encodeInputPictureGranularity	(64,64)	max
supportedEncodeFeedbackFlags	VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BUFFER_OFFSET_BIT_KHR VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BYTES_WRITTEN_BIT_KHR	min
VkVideoEncodeH265CapabilitiesKHR
flags	-	min
maxLevelIdc	STD_VIDEO_H265_LEVEL_IDC_1_0	min
maxSliceSegmentCount	1	min
maxTiles	(1,1)	min
ctbSizes	at least one bit set	implementation-dependent
transformBlockSizes	at least one bit set	implementation-dependent
maxPPictureL0ReferenceCount	0	min
maxBPictureL0ReferenceCount	0	min
maxL1ReferenceCount	0	min
maxSubLayerCount	1	min
expectDyadicTemporalSubLayerPattern	-	implementation-dependent
minQp	-	max
maxQp	-	min
prefersGopRemainingFrames	-	implementation-dependent
requiresGopRemainingFrames	-	implementation-dependent
stdSyntaxFlags	-	min
VkVideoEncodeQuantizationMapCapabilitiesKHR
maxQuantizationMapExtent	- 2	min
VkVideoEncodeH265QuantizationMapCapabilitiesKHR
minQpDelta	- 3	max
maxQpDelta	- 3	min

AV1 Encode Operations

Video encode operations using an AV1 encode profilecan be used to encode elementary video stream sequences compliant with the AV1 Specification.

Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.

This process is performed according to the video encode operation steps with the codec-specific semantics defined in section 7 of the AV1 Specification:

Syntax elements, derived values, and other parameters are applied from the following structures:
- The StdVideoAV1SequenceHeader structure, the optional StdVideoEncodeAV1DecoderModelInfo structure, and the optional array of StdVideoEncodeAV1OperatingPointInfo structures stored in the bound video session parameters object specifying the active sequence header.
- The StdVideoEncodeAV1PictureInfo structure specifying the AV1 picture information.
- The StdVideoEncodeAV1ReferenceInfo structures specifying the AV1 reference information corresponding to the optional reconstructed picture and any active reference pictures.
- The encoded bitstream data is written to the destination video bitstream buffer range as defined in the AV1 Encode Bitstream Data Access section.
- Picture data in the video picture resources corresponding to the used encode input picture, active reference pictures, and optional reconstructed picture is accessed as defined in the AV1 Encode Picture Data Access section.
The decision on reference picture setup is made according to the parameters specified in the AV1 picture information.

If the parameters adhere to the syntactic and semantic requirements defined in the corresponding sections of the AV1 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video encode operation will complete successfully. Otherwise, the video encode operation may complete unsuccessfully.

AV1 Encode Parameter Overrides

Implementations may override, unless otherwise specified, any of the AV1 encode parameters specified in the following Video Std structures:

StdVideoAV1SequenceHeader
StdVideoEncodeAV1DecoderModelInfo
StdVideoEncodeAV1OperatingPointInfo
StdVideoEncodeAV1PictureInfo
StdVideoEncodeAV1ReferenceInfo

All such AV1 encode parameter overrides must fulfill the conditions defined in the Video Encode Parameter Overrides section.

In addition, implementations must not override any of the following AV1 encode parameters:

the following parameters specified in StdVideoAV1SequenceHeader:
- flags.still_picture
- flags.enable_order_hint
- flags.frame_id_numbers_present_flag
- flags.film_grain_params_present
- flags.timing_info_present_flag
- flags.initial_display_delay_present_flag
- delta_frame_id_length_minus_2
- additional_frame_id_length_minus_1
- order_hint_bits_minus_1
the following parameters specified in the StdVideoAV1ColorConfig structure pointed to by StdVideoAV1SequenceHeader::pColorConfig:
- flags.mono_chrome
- flags.color_range
- BitDepth
- subsampling_x
- subsampling_y
- color_primaries
- transfer_characteristics
- matrix_coefficients
- chroma_sample_position
the following parameters specified in the StdVideoAV1TimingInfo structure pointed to by StdVideoAV1SequenceHeader::pTimingInfo:
- flags.equal_picture_interval
- num_units_in_display_tick
- time_scale
- num_ticks_per_picture_minus_1
the parameters specified in StdVideoEncodeAV1DecoderModelInfo
the parameters specified in StdVideoEncodeAV1OperatingPointInfo
the following parameters specified in StdVideoEncodeAV1PictureInfo:
- flags.show_frame
- flags.showable_frame
- frame_type
- frame_presentation_time
- current_frame_id
- order_hint
- refresh_frame_flags
- render_width_minus_1
- render_height_minus_1
- ref_order_hint
- ref_frame_idx
- delta_frame_id_minus_1
the following parameters specified in the StdVideoEncodeAV1ExtensionHeader structure pointed to by StdVideoEncodeAV1PictureInfo::pExtensionHeader when VkVideoEncodeAV1PictureInfoKHR::generateObuExtensionHeader is set to VK_TRUE:
- temporal_id
- spatial_id

If VkVideoEncodeAV1PictureInfoKHR::primaryReferenceCdfOnly is set to VK_TRUE for a video encode operation, the implementation will not override StdVideoEncodeAV1PictureInfo::primary_ref_frame.

Implementations supporting the VK_VIDEO_ENCODE_AV1_STD_PRIMARY_REF_FRAME_BIT_KHR AV1 syntax element capability reported in VkVideoEncodeAV1CapabilitiesKHR::stdSyntaxFlags also guarantee the use of the application-specified value for StdVideoEncodeAV1PictureInfo::primary_ref_frame. While support for VK_VIDEO_ENCODE_AV1_STD_PRIMARY_REF_FRAME_BIT_KHR guarantees that the implementation will not override the application-specified primary_ref_frame, it does not mandate the implementation to use the reference picture indicated by primary_ref_frame for sample prediction as implementations can always decide to use only a subset of the application-specified reference pictures for sample prediction. This means that implementations supporting VK_VIDEO_ENCODE_AV1_STD_PRIMARY_REF_FRAME_BIT_KHR may end up using the reference picture indicated by primary_ref_frame only for CDF data reference even if the application did not set VkVideoEncodeAV1PictureInfoKHR::primaryReferenceCdfOnly to VK_TRUE.

If VkVideoEncodeAV1CapabilitiesKHR::codedPictureAlignment is not equal to {8,8} for the used video profile, implementations will override the coded picture’s resolution and parameters related to the width and height in the following manner:

Let w and h be the codedExtent.width and codedExtent.height of the VkVideoPictureResourceInfoKHR structure corresponding to the encode input picture, rounded up to the nearest integer multiple of 8.
Let aW and aH be w and h rounded up to the nearest integer multiple of codedPictureAlignment.width and codedPictureAlignment.height respectively.
If w equals aW, no override will occur. Otherwise the coded width will be aW.
If h equals aH, no override will occur. Otherwise the coded height will be aH.

The AV1 specification codes all resolutions to an 8x8 alignment, but supports unaligned resolutions through implicit cropping. Thus, if the original coded extent, aligned to 8x8, meets the implementation required alignment, no override needs to occur. Otherwise, the implementation cannot code the requested coded extent, so the final resolution in the bitstream is overridden to be aligned to the implementation required alignment.

For example, consider an implementation that is only able to output bitstreams that are 16x16 aligned (as indicated by VkVideoEncodeAV1CapabilitiesKHR::codedPictureAlignment). If an application requests the coded extent to be 1920x1080, the resulting bitstream will be 1920x1088. On the other hand, a request of 1920x1082 will result in no override, since the 8x8 alignment of this resolution (1920x1088) is 16x16 aligned.

In case of a video session parameters object created with VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR, the following AV1 sequence header parameters may be overridden by the implementation according to the quantization map texel size the video session parameters object was created with:

StdVideoAV1SequenceHeader::flags.use_128x128_superblock

This may be necessary to change the AV1 superblock size used during encoding to be compatible with the used quantization map texel size.

In case of AV1 encode parameters stored in video session parameters objects, applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened. If the query indicates that implementation overrides were applied, then the application needs to retrieve and use the encoded AV1 sequence header in the bitstream in order to be able to produce a compliant AV1 video bitstream using the AV1 encode parameters stored in the video session parameters object.

In case of any AV1 encode parameters stored in the encoded bitstream produced by video encode operations, if the implementation supports the VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR

video encode feedback query flag, the application can use such queries to retrieve feedback about whether any implementation overrides have been applied to those AV1 encode parameters.

AV1 Encode Bitstream Data Access

Each video encode operation writes either:

A single OBU with obu_type equal to OBU_FRAME comprising of the frame header and tile data of the encoded picture, or
An OBU with obu_type equal to OBU_FRAME_HEADER encapsulating the frame header of the encoded picture, followed by one or more OBUs with obu_type equal to OBU_TILE_GROUP comprising of the tile data of the encoded picture.

In addition, if VkVideoEncodeAV1PictureInfoKHR::generateObuExtensionHeader is set to VK_TRUE for the video encode operation, then OBU extension headers are included in the generated bitstream as defined in sections 5.3.1, 5.3.2, and 5.3.3 of the AV1 Specification.

AV1 Encode Picture Data Access

Accesses to image data within a video picture resource happen at the granularity indicated by VkVideoCapabilitiesKHR::pictureAccessGranularity, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile. Accordingly, the complete image subregion of a encode input picture, reference picture, or reconstructed picture accessed by video coding operations using an AV1 encode profile is defined as the set of texels within the coordinate range:

([0,endX),[0,endY))

Where:

endX equals codedExtent.width rounded up to the nearest integer multiple of pictureAccessGranularity.width and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
endY equals codedExtent.height rounded up to the nearest integer multiple of pictureAccessGranularity.height and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;

Where codedExtent is the member of the VkVideoPictureResourceInfoKHR structure corresponding to the picture.

In case of video encode operations using an AV1 encode profile, any access to a picture at the coordinates (x,y), as defined by the AV1 Specification, is an access to the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure at the texel coordinates (x,y).

AV1 Reference Names and Semantics

Individual reference frames used in the encoding process have different semantics, as defined in section 6.10.24 of the AV1 Specification. The AV1 semantics associated with a reference picture is indicated by the corresponding enumeration constant defined in the Video Std enumeration type StdVideoAV1ReferenceName:

STD_VIDEO_AV1_REFERENCE_NAME_INTRA_FRAME identifies the reference used for intra coding (INTRA_FRAME), as defined in sections 2 and 7.11.2 of the AV1 Specification.
All other enumeration constants refer to backward or forward references used for inter coding, as defined in sections 2 and 7.11.3 of the AV1 Specification:
- STD_VIDEO_AV1_REFERENCE_NAME_LAST_FRAME identifies the LAST_FRAME reference
- STD_VIDEO_AV1_REFERENCE_NAME_LAST2_FRAME identifies the LAST2_FRAME reference
- STD_VIDEO_AV1_REFERENCE_NAME_LAST3_FRAME identifies the LAST3_FRAME reference
- STD_VIDEO_AV1_REFERENCE_NAME_GOLDEN_FRAME identifies the GOLDEN_FRAME reference
- STD_VIDEO_AV1_REFERENCE_NAME_BWDREF_FRAME identifies the BWDREF_FRAME reference
- STD_VIDEO_AV1_REFERENCE_NAME_ALTREF2_FRAME identifies the ALTREF2_FRAME reference
- STD_VIDEO_AV1_REFERENCE_NAME_ALTREF_FRAME identifies the ALTREF_FRAME reference

These enumeration constants are not directly used in any APIs but are used to indirectly index into certain Video Std and Vulkan API parameter arrays.

AV1 Prediction Modes

AV1 encoding supports multiple types of prediction modes, as described in section 6.10.24 of the AV1 Specification.

AV1 Coding Blocks

AV1 encode supports two types of coding blocks, as defined in section 2 of the AV1 Specification:

Superblock.
Mode info block.

AV1 Encode Profile

AV1 Encode Capabilities

AV1 Encode Quality Level Properties

AV1 Encode Session

Additional parameters can be specified when creating a video session with an AV1 encode profile by including an instance of the VkVideoEncodeAV1SessionCreateInfoKHR structure in the pNext chain of VkVideoSessionCreateInfoKHR.

AV1 Encode Parameter Sets

Video session parameters objects created with the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR contain a single instance of the following parameter set:

AV1 Sequence Header

Represented by StdVideoAV1SequenceHeader structures and interpreted as follows:

flags.reserved and reserved1 are used only for padding purposes and are otherwise ignored;
the StdVideoAV1ColorConfig structure pointed to by pColorConfig is interpreted as follows:
- flags.reserved and reserved1 are used only for padding purposes and are otherwise ignored;
- all other members of StdVideoAV1ColorConfig are interpreted as defined in section 6.4.2 of the AV1 Specification;
if flags.timing_info_present_flag is set, then the StdVideoAV1TimingInfo structure pointed to by pTimingInfo is interpreted as follows:
- flags.reserved is used only for padding purposes and is otherwise ignored;
- all other members of StdVideoAV1TimingInfo are interpreted as defined in section 6.4.3 of the AV1 Specification;
all other members of StdVideoAV1SequenceHeader are interpreted as defined in section 6.4 of the AV1 Specification.

When StdVideoAV1SequenceHeader::flags.timing_info_present_flag is set, the AV1 sequence header can be amended with AV1 decoder model information, represented by a StdVideoEncodeAV1DecoderModelInfo structure and interpreted as follows:

reserved1 is used only for padding purposes and is otherwise ignored;
all other members of StdVideoEncodeAV1DecoderModelInfo are interpreted as defined in section 6.4.4 of the AV1 Specification.

When StdVideoAV1SequenceHeader::flags.reduced_still_picture_header is not set, the AV1 sequence header can be amended with AV1 operating point information, represented by an array of StdVideoEncodeAV1OperatingPointInfo structures and interpreted as follows:

flags.reserved is used only for padding purposes and is otherwise ignored;
all other members of StdVideoEncodeAV1OperatingPointInfo are interpreted as the corresponding element of the respective arrays defined in section 6.4 of the AV1 Specification.

Implementations may override any of these parameters according to the semantics defined in the Video Encode Parameter Overrides section before storing the resulting AV1 sequence header into the video session parameters object. Applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened and to retrieve the encoded AV1 sequence header in order to be able to produce a compliant AV1 video bitstream.

The encoded AV1 sequence header retrieved using the vkGetEncodedVideoSessionParametersKHR command is encoded as a single OBU with obu_type equal to OBU_SEQUENCE_HEADER, as defined in section 5.3 of the AV1 Specification.

Such AV1 sequence header overrides may also have cascading effects on the implementation overrides applied to the encoded bitstream produced by video encode operations. If the implementation supports the VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR

video encode feedback query flag, then the application can use such queries to retrieve feedback about whether any implementation overrides have been applied to the encoded bitstream.

AV1 Encoding Parameters

AV1 Encode Rate Control

Group of Pictures

In case of AV1 encoding it is common practice to follow a regular pattern of frame types and prediction directions in display order when encoding subsequent frames. This pattern is referred to as the group of pictures (GOP).

The AV1 Specification, unlike some other video compression standards, does not restrict the direction in display order of the referenced frames based on the used frame type or AV1 prediction mode. Accordingly, this specification introduces the concept of rate control groups for which the application can specify separate rate control configuration parameters. When encoding a frame, the application specifies the rate control group the encoded frame belongs to through a VkVideoEncodeAV1RateControlGroupKHR value in VkVideoEncodeAV1PictureInfoKHR::rateControlGroup. This value is then used by the implementation’s rate control algorithm to determine which rate control configuration parameters apply to it.

A regular GOP is defined by the following parameters:

The number of frames in the GOP;
The number of consecutive frames encoded with VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR between frames encoded with other rate control groups in display order.

GOPs are further classified as open and closed GOPs.

Frame types in an open GOP follow each other in display order according to the following algorithm:

The first frame is always a frame encoded with VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_INTRA_KHR.
This is followed by a number of consecutive frames encoded with VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR.
If the number of frames in the GOP is not reached yet, then the next frame is a frame encoded with VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_PREDICTIVE_KHR and the algorithm continues from step 2.

In case of a closed GOP, a frame with the AV1 frame type STD_VIDEO_AV1_FRAME_TYPE_KEY is used at a certain period.

It is also typical for AV1 encoding to use specific reference picture usage patterns across the frames of the GOP. The two most common reference patterns used are as follows:

Flat Reference Pattern

Each frame encoded with VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_PREDICTIVE_KHR refers to the last frame that was not encoded using VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR, in display order, as its forward reference.
Each frame encoded with VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR refers to the last frame that was not encoded with VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR, in display order, as its forward reference, and refers to the next frame that was not encoded with VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR, in display order, as its backward reference.

Dyadic Reference Pattern

Each frame encoded with VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_PREDICTIVE_KHR refers to the last frame that was not encoded with VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR, in display order, as its forward reference.
The following algorithm is applied to the sequence of consecutive frames encoded with VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR between frames using other rate control groups in display order:
1. The frame in the middle of this sequence uses the frame preceding the sequence as its forward reference, and uses the frame following the sequence as its backward reference.
2. The algorithm is executed recursively for the following frame sequences:
  - The frames of the original sequence preceding the frame in the middle, if any.
  - The frames of the original sequence following the frame in the middle, if any.

The application can provide guidance to the implementation’s rate control algorithm about the structure of the GOP used by the application. Any such guidance about the GOP and its structure does not mandate that specific GOP structure to be used by the application, as the frame type and the selected rate control group is still application-controlled, however, any deviation from the provided guidance may result in undesired rate control behavior including, but not limited, to the implementation not being able to conform to the expected average or target bitrates, or other rate control parameters specified by the application.

When an AV1 encode session is used to encode multiple temporal layers, it is also common practice to follow a regular pattern for the AV1 temporal ID for the encoded frames in display order when encoding subsequent frames. This pattern is referred to as the temporal GOP. The most common temporal layer pattern used is as follows:

Dyadic Temporal Layer Pattern

The number of frames in the temporal GOP is 2^n-1, where n is the number of temporal layers.
The i^th frame in the temporal GOP uses temporal ID t, if and only if the index of the least significant bit set in i equals n-t-1, except for the first frame, which is the only frame in the temporal GOP using temporal ID zero.
The i^th frame in the temporal GOP uses the r^th frame as reference, where r is calculated from i by clearing the least significant bit set in it, except for the first frame in the temporal GOP, which uses the first frame of the previous temporal GOP, if any, as reference.

Multi-layer rate control and multi-layer coding are typically used for streaming cases where low latency is expected, hence frames usually do not use backward references in display order.

Rate Control Layers

GOP Remaining Frames

Besides session level rate control configuration, the application can specify the number of frames per frame type remaining in the group of pictures (GOP).

AV1 Quantizer Index Delta Maps

Quantization delta maps used with an AV1 encode profile are referred to as quantizer index delta maps and their texels contain integer values representing quantizer index delta values that are applied in the process of determining the quantizer indices of the encoded picture.

Accordingly, AV1 quantizer index delta maps always have single channel integer formats, as reported in VkVideoFormatPropertiesKHR::format.

When the rate control mode is VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR, the quantizer index delta values are added to the constant quantizer index value that, in effect, enable the application to explicitly control the used quantizer index values at the granularity of the used quantization map texel size.

For all other rate control modes, the quantizer index delta values can be used to offset the quantizer index values that the rate control algorithm would otherwise produce.

AV1 Encode Quantization

Performing AV1 encode operations involves the process of assigning quantizer index values to individual AV1 mode info blocks. This process depends on the used rate control mode, as well as other encode and rate control parameters, as described below:

If the configured rate control mode is VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR, then the quantizer index value is initialized by the implementation-specific default rate control algorithm.
- If the video encode operation is issued with a quantization delta map, the quantizer index delta value corresponding to the mode info block, as fetched from the quantization map, is added to the previously determined quantizer index value. If the fetched quantizer index delta value falls outside the supported quantizer index delta value range reported in the minQIndexDelta and maxQIndexDelta members of VkVideoEncodeAV1QuantizationMapCapabilitiesKHR, then the quantizer index value used for the mode info block becomes undefined:.
If the configured rate control mode is VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR, then the quantizer index value is initialized from the constant quantizer index value specified for the encoded frame.
- If the video encode operation is issued with a quantization delta map, the quantizer index delta value corresponding to the mode info block, as fetched from the quantization map, is added to the previously determined quantizer index value. If the fetched quantizer index delta value falls outside the supported quantizer index delta value range reported in the minQIndexDelta and maxQIndexDelta members of VkVideoEncodeAV1QuantizationMapCapabilitiesKHR, then the quantizer index value used for the mode info block becomes undefined:.
If the configured rate control mode is not VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR or VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR, then the quantizer index value is initialized by the corresponding rate control algorithm.
- If the video encode operation is issued with a quantization delta map, the quantizer index delta value corresponding to the mode info block, as fetched from the quantization map, is added to the previously determined quantizer index value. If the fetched quantizer index delta value falls outside the supported quantizer index delta value range reported in the minQIndexDelta and maxQIndexDelta members of VkVideoEncodeAV1QuantizationMapCapabilitiesKHR, then the quantizer index value used for the mode info block becomes undefined:.
- If the video encode operation is issued with an emphasis map, the rate control will adjust the quantizer index value based on the emphasis value corresponding to the mode info block, as fetched from the quantization map, according to the following equation:
  QIndex_new = f(QIndex_prev,e)
  Where QIndex_new is the resulting quantizer index value, QIndex_prev is the previously determined quantizer index value, e is the emphasis value corresponding to the macroblock, and f is an implementation-defined function for which the following implication is true:
  e₁ < e₂ ⇒ f(QIndex,e₁) ≥ f(QIndex,e₂)
  This means that lower emphasis values will result in higher quantizer index values, whereas higher emphasis values will result in lower quantizer index values, but the function is not strictly decreasing with respect to the input emphasis value for a given input quantizer index value.
- If clamping to minimum quantizer index values is enabled in the applied rate control layer, then the quantizer index value is clamped to the corresponding minimum quantizer index value.
- If clamping to maximum quantizer index values is enabled in the applied rate control layer, then the quantizer index value is clamped to the corresponding maximum quantizer index value.
In all cases, the final quantizer index value is clamped to the minimum and maximum quantizer index values supported by the video profile.

AV1 Encode Requirements

This section described the required AV1 encoding capabilities for physical devices that have at least one queue family that supports the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 in VkQueueFamilyVideoPropertiesKHR::videoCodecOperations.

Table 63. Required Video Std Header Versions
Video Std Header Name	Version
vulkan_video_codec_av1std_encode	1.0.0

Table 64. Required Video Capabilities
Video Capability	Requirement	Requirement Type1
VkVideoCapabilitiesKHR
flags	-	min
minBitstreamBufferOffsetAlignment	4096	max
minBitstreamBufferSizeAlignment	4096	max
pictureAccessGranularity	(64,64)	max
minCodedExtent	-	max
maxCodedExtent	-	min
maxDpbSlots	0	min
maxActiveReferencePictures	0	min
VkVideoEncodeCapabilitiesKHR
flags	-	min
rateControlModes	VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR 8	min
maxBitrate	5529600	min
maxQualityLevels	1	min
encodeInputPictureGranularity	(64,64)	max
supportedEncodeFeedbackFlags	VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BUFFER_OFFSET_BIT_KHR VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BYTES_WRITTEN_BIT_KHR	min
VkVideoEncodeAV1CapabilitiesKHR
flags	-	min
codedPictureAlignment	(8,8)	min
maxTiles	1	min
minTileSize	-	max
maxTileSize	-	min
maxLevel	STD_VIDEO_AV1_LEVEL_2_0	min
superblockSizes	at least one bit set	implementation-dependent
maxSingleReferenceCount	0	min
singleReferenceNameMask	- 2	min
maxUnidirectionalCompoundReferenceCount	0 3	min
maxUnidirectionalCompoundGroup1ReferenceCount	0 3,4	min
unidirectionalCompoundReferenceNameMask	- 2	min
maxBidirectionalCompoundReferenceCount	0 3	min
maxBidirectionalCompoundGroup1ReferenceCount	0 3,5	min
maxBidirectionalCompoundGroup2ReferenceCount	0 3,5	min
bidirectionalCompoundReferenceNameMask	- 2	min
maxTemporalLayerCount	1	min
maxSpatialLayerCount	1	min
maxOperatingPoints	0	min
minQIndex	-	max
maxQIndex	-	min
prefersGopRemainingFrames	-	implementation-dependent
requiresGopRemainingFrames	-	implementation-dependent
stdSyntaxFlags	-	min
VkVideoEncodeQuantizationMapCapabilitiesKHR
maxQuantizationMapExtent	- 6	min
VkVideoEncodeAV1QuantizationMapCapabilitiesKHR
minQIndexDelta	- 7	max
maxQIndexDelta	- 7	min