Video Coding

Vulkan implementations may expose one or more queue families supporting video coding operations. These operations are performed by recording them into a command buffer within a video coding scope, and submitting them to queues with compatible video coding capabilities.

The Vulkan video functionalities are designed to be made available through a set of APIs built on top of each other, consisting of:

  • A core API providing common video coding functionalities,
  • APIs providing codec-independent video decode and video encode related functionalities, respectively,
  • Additional codec-specific APIs built on top of those.

This chapter details the fundamental components and operations of these.

Video Picture Resources

In the context of video coding, multidimensional arrays of image data that can be used as the source or target of video coding operations are referred to as video picture resources. They may store additional metadata that includes implementation-private information used during the execution of video coding operations, as discussed later.

Video picture resources are backed by VkImage objects. Individual subregions of VkImageView objects created from such resources can be used as decode output pictures, encode input pictures, reconstructed pictures, and/or reference pictures.

The parameters of a video picture resource are specified using a VkVideoPictureResourceInfoKHR structure.

VkVideoPictureResourceInfoKHRStructure specifying the parameters of a video picture resource

Decoded Picture Buffer

An integral part of video coding pipelines is the reconstruction of pictures from a compressed video bitstream. A reconstructed picture is a video picture resource resulting from this process.

Such reconstructed pictures can be used as reference pictures in subsequent video coding operations to provide predictions of the values of samples of subsequently decoded or encoded pictures. The correct use of such reconstructed pictures as reference pictures is driven by the video compression standard, the implementation, and the application-specific use cases.

The list of reference pictures used to provide such predictions within a single video coding operation is referred to as the list of active reference pictures.

The decoded picture buffer (DPB) is an indexed data structure that maintains the set of reference pictures available to be used in video coding operations.

Individual indexed entries of the DPB are referred to as the decoded picture buffer (DPB) slots.
The range of valid DPB slot indices is between zero and N-1, where N is the capacity of the DPB. Each DPB slot can refer to a reference picture containing a video frame or can refer to up to two reference pictures containing the top and/or bottom fields that, when both present, together represent a full video frame .

In Vulkan, the state and the backing store of the DPB is separated as follows:

In addition, the implementation may also maintain opaque metadata associated with DPB slots, including:

  • Reference picture metadata corresponding to the video picture resource associated with the DPB slot.

Such metadata may be stored by the implementation as part of the DPB slot state maintained by the video session, or as part of the video picture resource backing the DPB slot.

Any metadata stored in the video picture resources backing DPB slots are independent of the video session used to store it, hence such video picture resources can be shared with other video sessions. Correspondingly, any metadata that is dependent on the video session will always be stored as part of the DPB slot state maintained by that video session.

The responsibility of managing the DPB is split between the application and the implementation as follows:

In addition, the application is also responsible for managing the mapping between the codec-specific picture IDs and DPB slots, and any other codec-specific states unless otherwise specified.

DPB Slot States

At a given time, each DPB slot is either in active or inactive state. Initially, all DPB slots managed by a video session are in inactive state.

A DPB slot can be activated by using it as the target of picture reconstruction in a video coding operation with the reconstructed picture requested to be set up as a reference picture, according to the codec-specific semantics, changing its state to active and associating it with a picture reference to the reconstructed pictures.

Some video coding standards allow multiple picture references to be associated with a single DPB slot. In this case the state of the individual picture references can be independently updated.

As an example, H.264 decoding allows associating a separate top field and bottom field picture with the same DPB slot.

As part of reference picture setup, the implementation may also generate reference picture metadata. Such reference picture metadata is specific to each picture reference associated with the DPB slot.

If such a video coding operation completes successfully, the activated DPB slot will have a valid picture reference and the reconstructed picture is associated with the DPB slot. This is true even if the DPB slot is used as the target of a picture reconstruction that only sets up a top field or bottom field reference picture and thus does not yet refer to a complete frame. However, if any data provided as input to such a video coding operation is not compliant with the video compression standard used, that video coding operation may complete unsuccessfully, in which case the activated DPB slot will have an invalid picture reference. This is true even if the DPB slot previously had a valid picture reference to a top field or bottom field reference picture, but the reconstruction of the other field corresponding to the DPB slot failed.

The application can use queries to get feedback about the outcome of video coding operations and use the resulting VkQueryResultStatusKHR value to determine whether the video coding operation completed successfully (result status is positive) or unsuccessfully (result status is negative).

Using a reference picture associated with a DPB slot that has an invalid picture reference as an active reference picture in subsequent video coding operations is legal, however, the contents of the outputs of such operations are undefined:, and any DPB slots activated by such video coding operations will also have an invalid picture reference. This is true even if such video coding operations may otherwise complete successfully.

A DPB slot can also be deactivated by the application, changing its state to inactive and invalidating any picture references and reference picture metadata associated with the DPB slot.

If an already active DPB slot is used as the target of picture reconstruction in a video coding operation, but the decoded picture is not requested to be set up as a reference picture, according to the codec-specific semantics, no reference picture setup happens and the corresponding picture reference and reference picture metadata is invalidated within the DPB slot. If the DPB slot no longer has any associated picture references after such an operation, the DPB slot is implicitly deactivated.

If an already active DPB slot is used as the target of picture reconstruction when decoding a field picture that is not marked as reference, then the behavior is as follows:

  • If the DPB slot is currently associated with a frame, then the DPB slot is deactivated.
  • If the DPB slot is not currently associated with a top field picture and the decoded picture is a top field picture, or if the DPB slot is not currently associated with a bottom field picture and the decoded picture is a bottom field picture, then the other field picture association of the DPB slot, if any, is not disturbed.
  • If the DPB slot is currently associated with a top field picture and the decoded picture is a top field picture, or if the DPB slot is currently associated with a bottom field picture and the decoded picture is a bottom field picture, then that picture association is invalidated, without disturbing the other field picture association, if any. If the DPB slot no longer has any associated picture references after such an operation, the DPB slot is implicitly deactivated.

A DPB slot can be activated with a new frame even if it is already active. In this case all previous associations of the DPB slots with reference pictures are replaced with an association with the reconstructed picture used to activate it.

If an already active DPB slot is activated with a reconstructed field picture, then the behavior is as follows:

  • If the DPB slot is currently associated with a frame, then that association is replaced with an association with the reconstructed field picture used to activate it.
  • If the DPB slot is not currently associated with a top field picture and the DPB slot is activated with a top field picture, or if the DPB slot is not currently associated with a bottom field picture and the DPB slot is activated with a bottom field picture, then the DPB slot is associated with the reconstructed field picture used to activate it, without disturbing the other field picture association, if any.
  • If the DPB slot is currently associated with a top field picture and the DPB slot is activated with a new top field picture, or if the DPB slot is currently associated with a bottom field picture and the DPB slot is activated with a new bottom field picture, then that association is replaced with an association with the reconstructed field picture used to activate it, without disturbing the other field picture association, if any.

Video Profiles

VkVideoProfileInfoKHRStructure specifying a video profile
VkVideoCodecOperationFlagBitsKHRVideo codec operation bits
VkVideoCodecOperationFlagsKHRBitmask of VkVideoCodecOperationFlagBitsKHR
VkVideoChromaSubsamplingFlagBitsKHRVideo format chroma subsampling bits

Chroma subsampling is described in more detail in the Chroma Reconstruction section.

VkVideoChromaSubsamplingFlagsKHRBitmask of VkVideoChromaSubsamplingFlagBitsKHR
VkVideoComponentBitDepthFlagBitsKHRVideo format component bit depth
VkVideoComponentBitDepthFlagsKHRBitmask of VkVideoComponentBitDepthFlagBitsKHR
VkVideoDecodeUsageInfoKHRStructure specifying video decode usage information
VkVideoDecodeUsageFlagBitsKHRVideo decode usage flags
VkVideoDecodeUsageFlagsKHRBitmask specifying the video decode usage flags
VkVideoEncodeUsageInfoKHRStructure specifying video encode usage information
VkVideoEncodeUsageFlagBitsKHRVideo encode usage flags
VkVideoEncodeUsageFlagsKHRBitmask specifying the video encode usage flags
VkVideoEncodeContentFlagBitsKHRVideo encode content flags
VkVideoEncodeContentFlagsKHRBitmask specifying the video encode content flags
VkVideoEncodeTuningModeKHRVideo encode tuning mode
VkVideoProfileListInfoKHRStructure specifying one or more video profiles used in conjunction

Video Capabilities

Video Coding Capabilities

vkGetPhysicalDeviceVideoCapabilitiesKHRQuery video coding capabilities
VkVideoCapabilitiesKHRStructure describing general video capabilities for a video profile
VkVideoCapabilityFlagBitsKHRVideo decode and encode capability bits
VkVideoCapabilityFlagsKHRBitmask of VkVideoCapabilitiesFlagBitsKHR

Video Format Capabilities

vkGetPhysicalDeviceVideoFormatPropertiesKHRQuery supported video decode and encode image formats and capabilities
VkPhysicalDeviceVideoFormatInfoKHRStructure specifying the codec video format
VkVideoFormatPropertiesKHRStructure enumerating the video image formats

Video Sessions

VkVideoSessionKHROpaque handle to a video session object

Creating a Video Session

vkCreateVideoSessionKHRCreates a video session object
VkVideoSessionCreateInfoKHRStructure specifying parameters of a newly created video session
VkVideoSessionCreateFlagBitsKHRVideo session creation flags
VkVideoSessionCreateFlagsKHRBitmask of VkVideoSessionCreateFlagBitsKHR

Destroying a Video Session

vkDestroyVideoSessionKHRDestroy video session object

Video Session Memory Association

After creating a video session object, and before the object can be used to record video coding operations into command buffers using it, the application must allocate and bind device memory to the video session. Device memory is allocated separately (see Device Memory) and then associated with the video session.

Video sessions may have multiple memory bindings identified by unique unsigned integer values. Appropriate device memory must be bound to each such memory binding before using the video session to record command buffer commands with it.

vkGetVideoSessionMemoryRequirementsKHRGet the memory requirements for a video session
VkVideoSessionMemoryRequirementsKHRStructure describing video session memory requirements
vkBindVideoSessionMemoryKHRBind Video Memory
VkBindVideoSessionMemoryInfoKHRStructure specifying memory bindings for a video session object

Video Profile Compatibility

Resources and query pools used with a particular video session must be compatible with the video profile the video session was created with.

A VkBuffer is compatible with a video profile if it was created with the VkBufferCreateInfo::pNext chain including a VkVideoProfileListInfoKHR structure with its pProfiles member containing an element matching the VkVideoProfileInfoKHR structure chain describing the video profile, and VkBufferCreateInfo::usage including at least one bit specific to video coding usage.

  • VK_BUFFER_USAGE_VIDEO_DECODE_SRC_BIT_KHR
  • VK_BUFFER_USAGE_VIDEO_DECODE_DST_BIT_KHR
  • VK_BUFFER_USAGE_VIDEO_ENCODE_SRC_BIT_KHR
  • VK_BUFFER_USAGE_VIDEO_ENCODE_DST_BIT_KHR

A VkBuffer is also compatible with a video profile if it was created with VkBufferCreateInfo::flags including VK_BUFFER_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR.

A VkImage is compatible with a video profile if it was created with the VkImageCreateInfo::pNext chain including a VkVideoProfileListInfoKHR structure with its pProfiles member containing an element matching the VkVideoProfileInfoKHR structure chain describing the video profile, and VkImageCreateInfo::usage including at least one bit specific to video coding usage.

  • VK_IMAGE_USAGE_VIDEO_DECODE_SRC_BIT_KHR
  • VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
  • VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
  • VK_IMAGE_USAGE_VIDEO_ENCODE_SRC_BIT_KHR
  • VK_IMAGE_USAGE_VIDEO_ENCODE_DST_BIT_KHR
  • VK_IMAGE_USAGE_VIDEO_ENCODE_DPB_BIT_KHR

A VkImage is also compatible with a video profile if all of the following conditions are true for the VkImageCreateInfo structure the image was created with:

While some of these rules allow creating buffer or image resources that may be compatible with any video profile, applications should still prefer to include the specific video profiles the buffer or image resource is expected to be used with (through a VkVideoProfileListInfoKHR structure included in the pNext chain of the corresponding create info structure) whenever the information about the complete set of video profiles is available at resource creation time, to enable the implementation to optimize the created resource for the specific use case. In the absence of that information, the implementation may have to make conservative decisions about the memory requirements or representation of the resource.

A VkImageView is compatible with a video profile if the VkImage it was created from is also compatible with that video profile.

A VkQueryPool is compatible with a video profile if it was created with the VkQueryPoolCreateInfo::pNext chain including a VkVideoProfileInfoKHR structure chain describing the same video profile, and VkQueryPoolCreateInfo::queryType having one of the following values:

  • VK_QUERY_TYPE_RESULT_STATUS_ONLY_KHR
  • VK_QUERY_TYPE_VIDEO_ENCODE_FEEDBACK_KHR

Video Session Parameters

Video session parameters objects can store preprocessed codec-specific parameters used with a compatible video session, and enable reducing the number of parameters needed to be provided and processed by the implementation while recording video coding operations into command buffers.

Parameters stored in such objects are immutable to facilitate the concurrent use of the stored parameters in multiple threads. At the same time, new parameters can be added to existing objects using the vkUpdateVideoSessionParametersKHR command.

In order to support concurrent use of the stored immutable parameters while also allowing the video session parameters object to be extended with new parameters, each video session parameters object maintains an update sequence counter that is set to 0 at object creation time and must be incremented by each subsequent update operation.

Certain video sequences that adhere to particular video compression standards permit updating previously supplied parameters. If a parameter update is necessary, the application has the following options:

  • Cache the set of parameters on the application side and create a new video session parameters object adding all the parameters with appropriate changes, as necessary; or
  • Create a new video session parameters object providing only the updated parameters and the previously used object as the template, which ensures that parameters not specified at creation time will be copied unmodified from the template object.

The actual types of parameters that can be stored and the capacity for individual parameter types, and the methods of initializing, updating, and referring to individual parameters are specific to the video codec operation the video session parameters object was created with.

Video session parameters objects created with an encode operation are further specialized based on the video encode quality level the video session parameters are used with, as implementations may apply different sets of parameter overrides depending on the used quality level. This enables implementations to store the potentially optimized set of parameters in these objects, further limiting the necessary processing required while recording video encode operations into command buffers.

VkVideoSessionParametersKHROpaque handle to a video session parameters object

Creating Video Session Parameters

vkCreateVideoSessionParametersKHRCreates video session parameters object
VkVideoSessionParametersCreateInfoKHRStructure specifying parameters of a newly created video session parameters object
VkVideoSessionParametersCreateFlagsKHRReserved for future use

Destroying Video Session Parameters

vkDestroyVideoSessionParametersKHRDestroy video session parameters object

Updating Video Session Parameters

vkUpdateVideoSessionParametersKHRUpdate video session parameters object
VkVideoSessionParametersUpdateInfoKHRStructure specifying video session parameters update information

Video Coding Scope

Applications can record video coding commands for a video session only within a video coding scope.

vkCmdBeginVideoCodingKHRBegin video coding scope
VkVideoBeginCodingInfoKHRStructure specifying video coding scope begin information
VkVideoBeginCodingFlagsKHRReserved for future use
VkVideoReferenceSlotInfoKHRStructure specifying information about a reference picture slot
vkCmdEndVideoCodingKHREnd video coding scope
VkVideoEndCodingInfoKHRStructure specifying video coding scope end information
VkVideoEndCodingFlagsKHRReserved for future use

Video Coding Control

vkCmdControlVideoCodingKHRControl video coding parameters
VkVideoCodingControlInfoKHRStructure specifying video coding control parameters
VkVideoCodingControlFlagBitsKHRVideo coding control flags
VkVideoCodingControlFlagsKHRBitmask of VkVideoCodingControlFlagBitsKHR

Inline Queries

If a video session was created with VK_VIDEO_SESSION_CREATE_INLINE_QUERIES_BIT_KHR, beginning queries using commands such as vkCmdBeginQuery within a video coding scope is not allowed. Instead, queries are executed inline by including an instance of the VkVideoInlineQueryInfoKHR structure in the pNext chain of the parameters of one of the video coding commands, with its queryPool member set to a valid VkQueryPool handle.

VkVideoInlineQueryInfoKHRStructure specifying inline query information for video coding commands

Video Decode Operations

Video decode operations consume compressed video data from a video bitstream buffer and zero or more reference pictures, and produce a decode output picture and an optional reconstructed picture.

Such decode output pictures can be shared with the Decoded Picture Buffer, and can also be used as the input of video encode operations, with graphics or compute operations, or with Window System Integration APIs, depending on the capabilities of the implementation.

Video decode operations may access the following resources in the VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR stage:

The image subresource of each video picture resource accessed by the video decode operation is specified using a corresponding VkVideoPictureResourceInfoKHR structure. Each such image subresource must be in the appropriate image layout as follows:

  • If the image subresource is used in the video decode operation only as decode output picture, then it must be in the VK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR layout.
  • If the image subresource is used in the video decode operation both as decode output picture and reconstructed picture, then it must be in the VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR layout.
  • If the image subresource is used in the video decode operation only as reconstructed picture, then it must be in the VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR layout.
  • If the image subresource is used in the video decode operation as a reference picture, then it must be in the VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR layout.

A video decode operation may complete unsuccessfully. In this case the decode output picture will have undefined: contents. Similarly, if reference picture setup is requested, the reconstructed picture will also have undefined: contents, and the activated DPB slot will have an invalid picture reference.

Codec-Specific Semantics

The following aspects of video decode operations are codec-specific:

  • The interpretation of the contents of the source video bitstream buffer range.
  • The construction and interpretation of the list of active reference pictures and the interpretation of the picture data referred to by the corresponding image subregions.
  • The construction and interpretation of information related to the decode output picture and the generation of picture data to the corresponding image subregion.
  • The decision on reference picture setup.
  • The construction and interpretation of information related to the optional reconstructed picture and the generation of picture data to the corresponding image subregion.

These codec-specific behaviors are defined for each video codec operation separately.

  • If the used video codec operation is VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR, then the codec-specific aspects of the video decoding process are performed as defined in the H.264 Decode Operations section.
  • If the used video codec operation is VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR, then the codec-specific aspects of the video decoding process are performed as defined in the H.265 Decode Operations section.
  • If the used video codec operation is VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR, then the codec-specific aspects of the video decoding process are performed as defined in the AV1 Decode Operations section.

Video Decode Operation Steps

Each video decode operation performs the following steps in the VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR stage:

  1. Reads the encoded video data from the source video bitstream buffer range.
  2. Performs picture reconstruction of the encoded video data according to the codec-specific semantics, applying any prediction data read from the active reference pictures in the process;
  3. Writes the decoded picture data to the decode output picture, and optionally to the reconstructed picture, if one is specified and is different from the decode output picture, according to the codec-specific semantics;
  4. If reference picture setup is requested, the DPB slot index specified in the reconstructed picture information is activated with the reconstructed picture.

When reconstructed picture information is provided, the specified DPB slot index is associated with the corresponding bound reference picture resource, indifferent of whether reference picture setup is requested.

Capabilities

VkVideoDecodeCapabilitiesKHRStructure describing general video decode capabilities for a video profile
VkVideoDecodeCapabilityFlagBitsKHRVideo decode capability flags
VkVideoDecodeCapabilityFlagsKHRBitmask of VkVideoDecodeCapabilityFlagBitsKHR

Video Decode Commands

vkCmdDecodeVideoKHRLaunch a video decode operation
VkVideoDecodeInfoKHRStructure specifying video decode parameters
VkVideoDecodeFlagsKHRReserved for future use

H.264 Decode Operations

Video decode operations using an H.264 decode profile can be used to decode elementary video stream sequences compliant to the ITU-T H.264 Specification.

Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.

This process is performed according to the video decode operation steps with the codec-specific semantics defined in section 8 of the ITU-T H.264 Specification as follows:

If the parameters and the bitstream adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.264 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video decode operation will complete successfully. Otherwise, the video decode operation may complete unsuccessfully.

H.264 Decode Bitstream Data Access

If the target decode output picture is a frame, then the video bitstream buffer range should contain a VCL NAL unit comprised of the slice headers and data of a picture representing an entire frame, as defined in sections 7.3.3 and 7.3.4, and this data is interpreted as defined in sections 7.4.3 and 7.4.4 of the ITU-T H.264 Specification, respectively.

If the target decode output picture is a field, then the video bitstream buffer range should contain a VCL NAL unit comprised of the slice headers and data of a picture representing a field, as defined in sections 7.3.3 and 7.3.4, and this data is interpreted as defined in sections 7.4.3 and 7.4.4 of the ITU-T H.264 Specification, respectively.

The offsets provided in VkVideoDecodeH264PictureInfoKHR::pSliceOffsets should specify the starting offsets corresponding to each slice header within the video bitstream buffer range.

H.264 Decode Picture Data Access

The effective imageOffset and imageExtent corresponding to a decode output picture, reference picture, or reconstructed picture used in video decode operations with an H.264 decode profile are defined as follows:

  • imageOffset is (codedOffset.x,codedOffset.y) and imageExtent is (codedExtent.width, codedExtent.height), if the picture represents a frame.
  • imageOffset is (codedOffset.x,codedOffset.y) and imageExtent is (codedExtent.width, codedExtent.height), if the picture represents a field and the picture layout of the used H.264 decode profile is VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR.
  • imageOffset is (codedOffset.x,codedOffset.y) and imageExtent is (codedExtent.width, codedExtent.height / 2), if the picture represents a field and the picture layout of the used H.264 decode profile is VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR.

Where codedOffset and codedExtent are the members of the VkVideoPictureResourceInfoKHR structure corresponding to the picture.

However, accesses to image data within a video picture resource happen at the granularity indicated by VkVideoCapabilitiesKHR::pictureAccessGranularity, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile. This means that the complete image subregion accessed by video coding operations using an H.264 decode profile for the video picture resource is defined as the set of texels within the coordinate range:

([startX,endX),[startY,endY))

Where:

  • startX equals imageOffset.x rounded down to the nearest integer multiple of pictureAccessGranularity.width;
  • endX equals imageOffset.x + imageExtent.width rounded up to the nearest integer multiple of pictureAccessGranularity.width and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
  • startY equals imageOffset.y rounded down to the nearest integer multiple of pictureAccessGranularity.height;
  • endY equals imageOffset.y + imageExtent.height rounded up to the nearest integer multiple of pictureAccessGranularity.height and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure.

In case of video decode operations using an H.264 decode profile, any access to a picture at the coordinates (x,y), as defined by the ITU-T H.264 Specification, is an access to the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure at the texel coordinates specified below:

  • (x,y), if the accessed picture represents a frame.
  • (x,y × 2), if the accessed picture represents a top field and the picture layout of the used H.264 decode profile is VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR.
  • (x,y × 2 + 1), if the accessed picture represents a bottom field and the picture layout of the used H.264 decode profile is VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR.
  • (x,y), if the accessed picture represents a top field and the picture layout of the used H.264 decode profile is VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR.
  • (codedOffset.x + x,codedOffset.y + y), if the accessed picture represents a bottom field and the picture layout of the used H.264 decode profile is VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR.

Where codedOffset is the member of the corresponding VkVideoPictureResourceInfoKHR structure.

H.264 Decode Profile

VkVideoDecodeH264ProfileInfoKHRStructure specifying H.264 decode-specific video profile parameters
VkVideoDecodeH264PictureLayoutFlagBitsKHRH.264 video decode picture layout flags
VkVideoDecodeH264PictureLayoutFlagsKHRBitmask of VkVideoDecodeH264PictureLayoutFlagBitsKHR

H.264 Decode Capabilities

VkVideoDecodeH264CapabilitiesKHRStructure describing H.264 decode capabilities

H.264 Decode Parameter Sets

Video session parameters objects created with the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHRcan contain the following types of parameters:

H.264 Sequence Parameter Sets (SPS)

Represented by StdVideoH264SequenceParameterSet structures and interpreted as follows:

  • reserved1 and reserved2 are used only for padding purposes and are otherwise ignored;
  • seq_parameter_set_id is used as the key of the SPS entry;
  • level_idc is one of the enum constants STD_VIDEO_H264_LEVEL_IDC_<major>_<minor> identifying the H.264 level <major>.<minor> as defined in section A.3 of the ITU-T H.264 Specification;
  • if flags.seq_scaling_matrix_present_flag is set, then the StdVideoH264ScalingLists structure pointed to by pScalingLists is interpreted as follows:
    • scaling_list_present_mask is a bitmask where bit index i corresponds to seq_scaling_list_present_flag[i] as defined in section 7.4.2.1 of the ITU-T H.264 Specification;
    • use_default_scaling_matrix_mask is a bitmask where bit index i corresponds to UseDefaultScalingMatrix4x4Flag[i], when i < 6, or corresponds to UseDefaultScalingMatrix8x8Flag[i-6], otherwise, as defined in section 7.3.2.1 of the ITU-T H.264 Specification;
    • ScalingList4x4 and ScalingList8x8 correspond to the identically named syntax elements defined in section 7.3.2.1 of the ITU-T H.264 Specification;
  • if flags.vui_parameters_present_flag is set, then pSequenceParameterSetVui is a pointer to a StdVideoH264SequenceParameterSetVui structure that is interpreted as follows:
    • reserved1 is used only for padding purposes and is otherwise ignored;
    • if flags.nal_hrd_parameters_present_flag or flags.vcl_hrd_parameters_present_flag is set, then the StdVideoH264HrdParameters structure pointed to by pHrdParameters is interpreted as follows:
      • reserved1 is used only for padding purposes and is otherwise ignored;
      • all other members of StdVideoH264HrdParameters are interpreted as defined in section E.2.2 of the ITU-T H.264 Specification;
    • all other members of StdVideoH264SequenceParameterSetVui are interpreted as defined in section E.2.1 of the ITU-T H.264 Specification;
  • all other members of StdVideoH264SequenceParameterSet are interpreted as defined in section 7.4.2.1 of the ITU-T H.264 Specification.
H.264 Picture Parameter Sets (PPS)

Represented by StdVideoH264PictureParameterSet structures and interpreted as follows:

  • the pair constructed from seq_parameter_set_id and pic_parameter_set_id is used as the key of the PPS entry;
  • if flags.pic_scaling_matrix_present_flag is set, then the StdVideoH264ScalingLists structure pointed to by pScalingLists is interpreted as follows:
    • scaling_list_present_mask is a bitmask where bit index i corresponds to pic_scaling_list_present_flag[i] as defined in section 7.4.2.2 of the ITU-T H.264 Specification;
    • use_default_scaling_matrix_mask is a bitmask where bit index i corresponds to UseDefaultScalingMatrix4x4Flag[i], when i < 6, or corresponds to UseDefaultScalingMatrix8x8Flag[i-6], otherwise, as defined in section 7.3.2.2 of the ITU-T H.264 Specification;
    • ScalingList4x4 and ScalingList8x8 correspond to the identically named syntax elements defined in section 7.3.2.2 of the ITU-T H.264 Specification;
  • all other members of StdVideoH264PictureParameterSet are interpreted as defined in section 7.4.2.2 of the ITU-T H.264 Specification.
VkVideoDecodeH264SessionParametersCreateInfoKHRStructure specifies H.264 decoder parameter set information
VkVideoDecodeH264SessionParametersAddInfoKHRStructure specifies H.264 decoder parameter set information

H.264 Decoding Parameters

VkVideoDecodeH264PictureInfoKHRStructure specifies H.264 decode picture parameters when decoding a picture
VkVideoDecodeH264DpbSlotInfoKHRStructure specifies H.264 decode DPB picture information

H.264 Decode Requirements

This section describes the required H.264 decoding capabilities for physical devices that have at least one queue family that supports the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 in VkQueueFamilyVideoPropertiesKHR::videoCodecOperations.

H.265 Decode Operations

Video decode operations using an H.265 decode profile can be used to decode elementary video stream sequences compliant to the ITU-T H.265 Specification.

Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.

This process is performed according to the video decode operation steps with the codec-specific semantics defined in section 8 of ITU-T H.265 Specification:

If the parameters and the bitstream adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.265 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video decode operation will complete successfully. Otherwise, the video decode operation may complete unsuccessfully.

H.265 Decode Bitstream Data Access

The video bitstream buffer range should contain a VCL NAL unit comprised of the slice segment headers and data of a picture representing a frame, as defined in sections 7.3.6 and 7.3.8, and this data is interpreted as defined in sections 7.4.7 and 7.4.9 of the ITU-T H.265 Specification, respectively.

The offsets provided in VkVideoDecodeH265PictureInfoKHR::pSliceSegmentOffsets should specify the starting offsets corresponding to each slice segment header within the video bitstream buffer range.

H.265 Decode Picture Data Access

Accesses to image data within a video picture resource happen at the granularity indicated by VkVideoCapabilitiesKHR::pictureAccessGranularity, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile. Accordingly, the complete image subregion of a decode output picture, reference picture, or reconstructed picture accessed by video coding operations using an H.265 decode profile is defined as the set of texels within the coordinate range:

([0,endX),[0,endY))

Where:

  • endX equals codedExtent.width rounded up to the nearest integer multiple of pictureAccessGranularity.width and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
  • endY equals codedExtent.height rounded up to the nearest integer multiple of pictureAccessGranularity.height and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;

Where codedExtent is the member of the VkVideoPictureResourceInfoKHR structure corresponding to the picture.

In case of video decode operations using an H.265 decode profile, any access to a picture at the coordinates (x,y), as defined by the ITU-T H.265 Specification, is an access to the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure at the texel coordinates (x,y).

H.265 Decode Profile

VkVideoDecodeH265ProfileInfoKHRStructure specifying H.265 decode profile

H.265 Decode Capabilities

VkVideoDecodeH265CapabilitiesKHRStructure describing H.265 decode capabilities

H.265 Decode Parameter Sets

Video session parameters objects created with the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHRcan contain the following types of parameters:

H.265 Video Parameter Sets (VPS)

Represented by StdVideoH265VideoParameterSet structures and interpreted as follows:

  • reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
  • vps_video_parameter_set_id is used as the key of the VPS entry;
  • the max_latency_increase_plus1, max_dec_pic_buffering_minus1, and max_num_reorder_pics members of the StdVideoH265DecPicBufMgr structure pointed to by pDecPicBufMgr correspond to vps_max_latency_increase_plus1, vps_max_dec_pic_buffering_minus1, and vps_max_num_reorder_pics, respectively, as defined in section 7.4.3.1 of the ITU-T H.265 Specification;
  • the StdVideoH265HrdParameters structure pointed to by pHrdParameters is interpreted as follows:
    • reserved is used only for padding purposes and is otherwise ignored;
    • flags.fixed_pic_rate_general_flag is a bitmask where bit index i corresponds to fixed_pic_rate_general_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
    • flags.fixed_pic_rate_within_cvs_flag is a bitmask where bit index i corresponds to fixed_pic_rate_within_cvs_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
    • flags.low_delay_hrd_flag is a bitmask where bit index i corresponds to low_delay_hrd_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
    • if flags.nal_hrd_parameters_present_flag is set, then pSubLayerHrdParametersNal is a pointer to an array of vps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where vps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265VideoParameterSet structure and each element is interpreted as follows:
      • cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
      • all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
    • if flags.vcl_hrd_parameters_present_flag is set, then pSubLayerHrdParametersVcl is a pointer to an array of vps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where vps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265VideoParameterSet structure and each element is interpreted as follows:
      • cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
      • all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
    • all other members of StdVideoH265HrdParameters are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
  • the StdVideoH265ProfileTierLevel structure pointed to by pProfileTierLevel are interpreted as follows:
    • general_level_idc is one of the enum constants STD_VIDEO_H265_LEVEL_IDC_<major>_<minor> identifying the H.265 level <major>.<minor> as defined in section A.4 of the ITU-T H.265 Specification;
    • all other members of StdVideoH265ProfileTierLevel are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
  • all other members of StdVideoH265VideoParameterSet are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.
H.265 Sequence Parameter Sets (SPS)

Represented by StdVideoH265SequenceParameterSet structures and interpreted as follows:

  • reserved1 and reserved2 are used only for padding purposes and are otherwise ignored;
  • the pair constructed from sps_video_parameter_set_id and sps_seq_parameter_set_id is used as the key of the SPS entry;
  • the StdVideoH265ProfileTierLevel structure pointed to by pProfileTierLevel are interpreted as follows:
    • general_level_idc is one of the enum constants STD_VIDEO_H265_LEVEL_IDC_<major>_<minor> identifying the H.265 level <major>.<minor> as defined in section A.4 of the ITU-T H.265 Specification;
    • all other members of StdVideoH265ProfileTierLevel are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
  • the max_latency_increase_plus1, max_dec_pic_buffering_minus1, and max_num_reorder_pics members of the StdVideoH265DecPicBufMgr structure pointed to by pDecPicBufMgr correspond to sps_max_latency_increase_plus1, sps_max_dec_pic_buffering_minus1, and sps_max_num_reorder_pics, respectively, as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
  • if flags.sps_scaling_list_data_present_flag is set, then the StdVideoH265ScalingLists structure pointed to by pScalingLists is interpreted as follows:
    • ScalingList4x4, ScalingList8x8, ScalingList16x16, and ScalingList32x32 correspond to ScalingList[0], ScalingList[1], ScalingList[2], and ScalingList[3], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
    • ScalingListDCCoef16x16 and ScalingListDCCoef32x32 correspond to scaling_list_dc_coef_minus8[0] and scaling_list_dc_coef_minus8[1], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
  • pShortTermRefPicSet is a pointer to an array of num_short_term_ref_pic_sets number of StdVideoH265ShortTermRefPicSet structures where each element is interpreted as follows:
    • reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
    • used_by_curr_pic_flag is a bitmask where bit index i corresponds to used_by_curr_pic_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
    • use_delta_flag is a bitmask where bit index i corresponds to use_delta_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
    • used_by_curr_pic_s0_flag is a bitmask where bit index i corresponds to used_by_curr_pic_s0_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
    • used_by_curr_pic_s1_flag is a bitmask where bit index i corresponds to used_by_curr_pic_s1_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
    • all other members of StdVideoH265ShortTermRefPicSet are interpreted as defined in section 7.4.8 of the ITU-T H.265 Specification;
  • if flags.long_term_ref_pics_present_flag is set then the StdVideoH265LongTermRefPicsSps structure pointed to by pLongTermRefPicsSps is interpreted as follows:
    • used_by_curr_pic_lt_sps_flag is a bitmask where bit index i corresponds to used_by_curr_pic_lt_sps_flag[i] as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
    • all other members of StdVideoH265LongTermRefPicsSps are interpreted as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
  • if flags.vui_parameters_present_flag is set, then the StdVideoH265SequenceParameterSetVui structure pointed to by pSequenceParameterSetVui is interpreted as follows:
    • reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
    • the StdVideoH265HrdParameters structure pointed to by pHrdParameters is interpreted as follows:
      • flags.fixed_pic_rate_general_flag is a bitmask where bit index i corresponds to fixed_pic_rate_general_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
      • flags.fixed_pic_rate_within_cvs_flag is a bitmask where bit index i corresponds to fixed_pic_rate_within_cvs_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
      • flags.low_delay_hrd_flag is a bitmask where bit index i corresponds to low_delay_hrd_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
      • if flags.nal_hrd_parameters_present_flag is set, then pSubLayerHrdParametersNal is a pointer to an array of sps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where sps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265SequenceParameterSet structure and each element is interpreted as follows:
        • cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
        • all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
      • if flags.vcl_hrd_parameters_present_flag is set, then pSubLayerHrdParametersVcl is a pointer to an array of sps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where sps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265SequenceParameterSet structure and each element is interpreted as follows:
        • cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
        • all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
      • all other members of StdVideoH265HrdParameters are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
    • all other members of pSequenceParameterSetVui are interpreted as defined in section E.3.1 of the ITU-T H.265 Specification;
  • if flags.sps_palette_predictor_initializer_present_flag is set, then the PredictorPaletteEntries member of the StdVideoH265PredictorPaletteEntries structure pointed to by pPredictorPaletteEntries is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification;
  • all other members of StdVideoH265SequenceParameterSet are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.
H.265 Picture Parameter Sets (PPS)

Represented by StdVideoH265PictureParameterSet structures and interpreted as follows:

  • reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
  • the triplet constructed from sps_video_parameter_set_id, pps_seq_parameter_set_id, and pps_pic_parameter_set_id is used as the key of the PPS entry;
  • if flags.pps_scaling_list_data_present_flag is set, then the StdVideoH265ScalingLists structure pointed to by pScalingLists is interpreted as follows:
    • ScalingList4x4, ScalingList8x8, ScalingList16x16, and ScalingList32x32 correspond to ScalingList[0], ScalingList[1], ScalingList[2], and ScalingList[3], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
    • ScalingListDCCoef16x16 and ScalingListDCCoef32x32 correspond to scaling_list_dc_coef_minus8[0] and scaling_list_dc_coef_minus8[1], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
  • if flags.pps_palette_predictor_initializer_present_flag is set, then the PredictorPaletteEntries member of the StdVideoH265PredictorPaletteEntries structure pointed to by pPredictorPaletteEntries is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification;
  • all other members of StdVideoH265PictureParameterSet are interpreted as defined in section 7.4.3.3 of the ITU-T H.265 Specification.
VkVideoDecodeH265SessionParametersCreateInfoKHRStructure specifies H.265 decoder parameter set information
VkVideoDecodeH265SessionParametersAddInfoKHRStructure specifies H.265 decoder parameter set information

H.265 Decoding Parameters

VkVideoDecodeH265PictureInfoKHRStructure specifies H.265 picture information when decoding a frame
VkVideoDecodeH265DpbSlotInfoKHRStructure specifies H.265 DPB information when decoding a frame

H.265 Decode Requirements

This section describes the required H.265 decoding capabilities for physical devices that have at least one queue family that supports the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 in VkQueueFamilyVideoPropertiesKHR::videoCodecOperations.

AV1 Decode Operations

Video decode operations using an AV1 decode profilecan be used to decode elementary video stream sequences compliant with the AV1 Specification.

Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.

This process is performed according to the video decode operation steps with the codec-specific semantics defined in section 7 of the AV1 Specification:

If the parameters and the bitstream adhere to the syntactic and semantic requirements defined in the corresponding sections of the AV1 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video decode operation will complete successfully. Otherwise, the video decode operation may complete unsuccessfully.

AV1 Decode Bitstream Data Access

The video bitstream buffer range should contain one or more frame OBUs, comprised of a frame header OBU and tile group OBU, that together represent an entire frame, as defined in sections 5.10, 5.9, and 5.11, and this data is interpreted as defined in sections 6.9, 6.8, and 6.10 of the AV1 Specification, respectively.

The offset specified in VkVideoDecodeAV1PictureInfoKHR::frameHeaderOffset should specify the starting offset of the frame header OBU of the frame.

When the tiles of the frame are encoded into multiple tile groups, each frame OBU has a separate frame header OBU but their content is expected to match per the requirements of the AV1 Specification. Accordingly, the offset specified in frameHeaderOffset can be the offset of any of the otherwise identical frame header OBUs when multiple tile groups are present.

The offsets and sizes provided in VkVideoDecodeAV1PictureInfoKHR::pTileOffsets and VkVideoDecodeAV1PictureInfoKHR::pTileSizes, respectively, should specify the starting offsets and sizes corresponding to each tile within the video bitstream buffer range.

AV1 Decode Picture Data Access

Accesses to image data within a video picture resource happen at the granularity indicated by VkVideoCapabilitiesKHR::pictureAccessGranularity, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile. Accordingly, the complete image subregion of a decode output picture, reference picture, or reconstructed picture accessed by video coding operations using an AV1 decode profile is defined as the set of texels within the coordinate range:

([0,endX),[0,endY))

Where:

  • endX equals codedExtent.width rounded up to the nearest integer multiple of pictureAccessGranularity.width and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
  • endY equals codedExtent.height rounded up to the nearest integer multiple of pictureAccessGranularity.height and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;

Where codedExtent is the member of the VkVideoPictureResourceInfoKHR structure corresponding to the picture.

In case of video decode operations using an AV1 decode profile, any access to a picture at the coordinates (x,y), as defined by the AV1 Specification, is an access to the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure at the texel coordinates (x,y).

AV1 Reference Names and Semantics

Individual reference frames used in the decoding process have different semantics, as defined in section 6.10.24 of the AV1 Specification. The AV1 semantics associated with a reference picture are indicated by the corresponding enumeration constant defined in the Video Std enumeration type StdVideoAV1ReferenceName:

  • STD_VIDEO_AV1_REFERENCE_NAME_INTRA_FRAME identifies the reference used for intra coding (INTRA_FRAME), as defined in sections 2 and 7.11.2 of the AV1 Specification.
  • All other enumeration constants refer to forward or backward references used for inter coding, as defined in sections 2 and 7.11.3 of the AV1 Specification:
    • STD_VIDEO_AV1_REFERENCE_NAME_LAST_FRAME identifies the LAST_FRAME reference
    • STD_VIDEO_AV1_REFERENCE_NAME_LAST2_FRAME identifies the LAST2_FRAME reference
    • STD_VIDEO_AV1_REFERENCE_NAME_LAST3_FRAME identifies the LAST3_FRAME reference
    • STD_VIDEO_AV1_REFERENCE_NAME_GOLDEN_FRAME identifies the GOLDEN_FRAME reference
    • STD_VIDEO_AV1_REFERENCE_NAME_BWDREF_FRAME identifies the BWDREF_FRAME reference
    • STD_VIDEO_AV1_REFERENCE_NAME_ALTREF2_FRAME identifies the ALTREF2_FRAME reference
    • STD_VIDEO_AV1_REFERENCE_NAME_ALTREF_FRAME identifies the ALTREF_FRAME reference

These enumeration constants are not directly used in any APIs but are used to indirectly index into certain Video Std and Vulkan API parameter arrays.

AV1 Decode Profile

VkVideoDecodeAV1ProfileInfoKHRStructure specifying AV1 decode profile

AV1 Decode Capabilities

VkVideoDecodeAV1CapabilitiesKHRStructure describing AV1 decode capabilities

AV1 Decode Parameter Sets

Video session parameters objects created with the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR contain a single instance of the following parameter set:

AV1 Sequence Header

Represented by StdVideoAV1SequenceHeader structures and interpreted as follows:

  • flags.reserved and reserved1 are used only for padding purposes and are otherwise ignored;
  • the StdVideoAV1ColorConfig structure pointed to by pColorConfig is interpreted as follows:
    • flags.reserved and reserved1 are used only for padding purposes and are otherwise ignored;
    • all other members of StdVideoAV1ColorConfig are interpreted as defined in section 6.4.2 of the AV1 Specification;
  • if flags.timing_info_present_flag is set, then the StdVideoAV1TimingInfo structure pointed to by pTimingInfo is interpreted as follows:
    • flags.reserved is used only for padding purposes and is otherwise ignored;
    • all other members of StdVideoAV1TimingInfo are interpreted as defined in section 6.4.3 of the AV1 Specification;
  • all other members of StdVideoAV1SequenceHeader are interpreted as defined in section 6.4 of the AV1 Specification.
VkVideoDecodeAV1SessionParametersCreateInfoKHRStructure specifies AV1 decoder parameter set information

AV1 Decoding Parameters

VkVideoDecodeAV1PictureInfoKHRStructure specifies AV1 picture information when decoding a frame
VkVideoDecodeAV1DpbSlotInfoKHRStructure specifies AV1 DPB information when decoding a frame

AV1 Decode Requirements

This section describes the required AV1 decoding capabilities for physical devices that have at least one queue family that supports the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 in VkQueueFamilyVideoPropertiesKHR::videoCodecOperations.

Video Encode Operations

Video encode operations consume an encode input picture and zero or more reference pictures, and produce compressed video data to a video bitstream buffer and an optional reconstructed picture.

Such encode input pictures can be used as the output of video decode operations, with graphics or compute operations, or with Window System Integration APIs, depending on the capabilities of the implementation.

Video encode operations may access the following resources in the VK_PIPELINE_STAGE_2_VIDEO_ENCODE_BIT_KHR stage:

The image subresource of each video picture resource accessed by the video encode operation is specified using a corresponding VkVideoPictureResourceInfoKHR structure. Each such image subresource must be in the appropriate image layout as follows:

  • If the image subresource is used in the video encode operation as an encode input picture, then it must be in the VK_IMAGE_LAYOUT_VIDEO_ENCODE_SRC_KHR layout.
  • If the image subresource is used in the video encode operation as a reconstructed picture or reference picture, then it must be in the VK_IMAGE_LAYOUT_VIDEO_ENCODE_DPB_KHR layout.

A video encode operation may complete unsuccessfully. In this case the target video bitstream buffer will have undefined: contents. Similarly, if reference picture setup is requested, the reconstructed-picture will also have undefined: contents, and the activated DPB slot will have an invalid picture reference.

If a video encode operation completes successfully and the codec-specific parameters provided by the application adhere to the syntactic and semantic requirements defined in the corresponding video compression standard, then the target video bitstream buffer will contain compressed video data after the execution of the video encode operation according to the respective codec-specific semantics.

Codec-Specific Semantics

The following aspects of video encode operations are codec-specific:

  • The compressed video data written to the target video bitstream buffer range.
  • The construction and interpretation of the list of active reference pictures and the interpretation of the picture data referred to by the corresponding image subregions.
  • The construction and interpretation of information related to the encode input picture and the interpretation of the picture data referred to by the corresponding image subregion.
  • The decision on reference picture setup.
  • The construction and interpretation of information related to the optional reconstructed picture and the generation of picture data to the corresponding image subregion.
  • Certain aspects of rate control.

These codec-specific behaviors are defined for each video codec operation separately.

  • If the used video codec operation is VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR, then the codec-specific aspects of the video encoding process are performed as defined in the H.264 Encode Operations section.
  • If the used video codec operation is VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR, then the codec-specific aspects of the video encoding process are performed as defined in the H.265 Encode Operations section.

Video Encode Parameter Overrides

Implementations supporting video encode operations for any particular video codec operation often support only a subset of the available encoding tools defined by the corresponding video compression standards. Accordingly, certain implementation-dependent limitations may apply to codec-specific parameters provided through the structures defined in the Video Std headers corresponding to the used video codec operation.

Exposing all of these restrictions on particular codec-specific parameter values or combinations thereof in the form of application-queryable capabilities is impractical, hence this specification allows implementations to override the value of any of the codec-specific parameters, unless otherwise specified, as long as all of the following conditions are met:

  • If the application-provided codec-specific parameters adhere to the syntactic and semantic requirements and rules defined by the used video compression standard, and thus would be usable to produce a video bitstream compliant with that standard, then the codec-specific parameters resulting from the process of implementation overrides must also adhere to the same requirements and rules, and any video bitstream produced using the overridden parameters must also be compliant.
  • The overridden codec-specific parameter values must not have an impact on the codec-independent behaviors defined for video encode operations.
  • The implementation must not override any codec-specific parameters specified to a command that may cause application-provided codec-specific parameters specified to subsequent commands to no longer adhere to the semantic requirements and rules defined by the used video compression standard, unless the implementation also overrides those parameters to adhere to any such requirements and rules.
  • The overridden codec-specific parameter values must not have an impact on the codec-specific picture data access semantics.
  • The overridden codec-specific parameter values may change the contents of the codec-specific bitstream elements produced by video encode operations or otherwise retrieved by the application (e.g. using the vkGetEncodedVideoSessionParametersKHR command) but must still adhere to the codec-specific semantics defined for that video codec operation, including, but not limited to, the number, type, and order of the encoded codec-specific bitstream elements.

Besides codec-specific parameter overrides performed for implementation-dependent reasons, applications can enable the implementation to apply additional optimizing overrides that may improve the efficiency or performance of video encoding operations. However, implementations must meet the conditions listed above even in case of such optimizing overrides.

Unless the application opts in for optimizing overrides, implementations are not expected to override any of the codec-specific parameters, except when such overrides are necessary for the correct operation of video encoder implementation due to limitations to the available encoding tools on that implementation.

Video Encode Operation Steps

Each video encode operation performs the following steps in the VK_PIPELINE_STAGE_2_VIDEO_ENCODE_BIT_KHR stage:

  1. Reads the input picture data from the encode input picture;
  2. Determine derived encoding quality parameters according to the codec-specific semantics and the current rate control state;
  3. Compresses the input picture data according to the codec-specific semantics, applying any prediction data read from the active reference pictures and rate control restrictions in the process;
  4. Writes the encoded bitstream data to the destination video bitstream buffer range;
  5. Performs picture reconstruction of the encoded video data according to the codec-specific semantics, applying any prediction data read from the active reference pictures in the process, if a reconstructed picture is specified and reference picture setup is requested;
  6. If reference picture setup is requested, the DPB slot index specified in the reconstructed picture information is activated with the reconstructed picture;
  7. Writes the reconstructed picture data to the reconstructed picture, if one is specified, according to the codec-specific semantics.

When reconstructed picture information is provided, the specified DPB slot index is associated with the corresponding bound reference picture resource, indifferent of whether reference picture setup is requested.

Capabilities

VkVideoEncodeCapabilitiesKHRStructure describing general video encode capabilities for a video profile
VkVideoEncodeCapabilityFlagBitsKHRVideo encode capability flags
VkVideoEncodeCapabilityFlagsKHRBitmask of VkVideoEncodeCapabilityFlagBitsKHR

Video Encode Quality Levels

Implementations can support more than one video encode quality levels for a video encode profile, which control the number and type of implementation-specific encoding tools and algorithms utilized in the encoding process.

Generally, using higher video encode quality levels may produce higher quality video streams at the cost of additional processing time. However, as the final quality of an encoded picture depends on the contents of the encode input picture, the contents of the active reference pictures, the codec-specific encode parameters, and the particular implementation-specific tools used corresponding to the individual video encode quality levels, there are no guarantees that using a higher video encode quality level will always produce a higher quality encoded picture for any given set of inputs.

vkGetPhysicalDeviceVideoEncodeQualityLevelPropertiesKHRQuery video encode quality level properties
VkPhysicalDeviceVideoEncodeQualityLevelInfoKHRStructure describing the video encode profile and quality level to query properties for
VkVideoEncodeQualityLevelPropertiesKHRStructure describing the video encode quality level properties
VkVideoEncodeQualityLevelInfoKHRStructure specifying used video encode quality level

Retrieving Encoded Session Parameters

Any codec-specific parameters stored in video session parameters objects may need to be separately encoded and included in the final video bitstream data, depending on the used video compression standard. In such cases the application must call the vkGetEncodedVideoSessionParametersKHR command to retrieve the encoded parameter data from the used video session parameters object in order to be able to produce a compliant video bitstream.

This is needed because implementations may have changed some of the codec-specific parameters stored in the video session parameters object, as defined in the Video Encode Parameter Overrides section. In addition, the vkGetEncodedVideoSessionParametersKHR command enables the application to retrieve the encoded parameter data without having to encode these codec-specific parameters manually.

vkGetEncodedVideoSessionParametersKHRGet encoded parameter sets from a video session parameters object
VkVideoEncodeSessionParametersGetInfoKHRStructure specifying parameters for retrieving encoded video session parameter data
VkVideoEncodeSessionParametersFeedbackInfoKHRStructure providing feedback about the requested video session parameters

Video Encode Commands

vkCmdEncodeVideoKHRLaunch video encode operations
VkVideoEncodeInfoKHRStructure specifying video encode parameters
VkVideoEncodeFlagsKHRReserved for future use

Video Encode Rate Control

The size of the encoded bitstream data produced by video encode operations is a function of the following set of constraints:

  • The capabilities of the compression algorithms defined and employed by the used video compression standard;
  • Restrictions imposed by the selected video profile according to the rules defined by the used video compression standard;
  • Further restrictions imposed by the capabilities supported by the implementation for the selected video profile;
  • The image data in the encode input picture and the set of active reference pictures (as these affect the effectiveness of the compression algorithms employed by the video encode operations);
  • The set of codec-specific and codec-independent encoding parameters provided by the application.

These also inherently define the set of decoder capabilities required for reconstructing and processing the picture data in the encoded bitstream.

Video coding uses bitrate as the quantitative metric associated with encoded bitstream data size which expresses the rate at which video bitstream data can be transferred or processed, measured in number of bits per second. This bitrate is both a function of the encoded bitstream data size of the encoded pictures as well as the frame rate used by the video sequence.

Rate control algorithms are used by video encode operations to enable adjusting encoding parameters to achieve a target bitrate, or otherwise directly or indirectly control the bitrate of the generated video bitstream data. These algorithms are usually not defined by the used video compression standard, although some video compression standards do provide non-normative guidelines for implementations.

Accordingly, this specification does not mandate implementations to produce identical encoded bitstream data outputs in response to video encode operations, however, it does define a set of codec-independent and codec-specific parameters that enable the application to control the behavior of the rate control algorithms supported by the implementation. Some of these parameters guarantee certain implementation behavior while others provide guidance for implementations to apply various rate control heuristics.

Applications need to make sure that they configure rate control parameters appropriately and that they follow the promises made to the implementation through parameters providing guidance for the implementation’s rate control algorithms and heuristics in order to be able to get the desired rate control behavior and to be able to hit the set bitrate targets. In addition, the behavior of rate control may also differ across implementations even if the capabilities of the used video profile match between those implementations. This may happen due to implementations applying different rate control algorithms or heuristics internally, and thus even the same set of guidance parameter values may have different effects on the rate control behavior across implementations.

Rate Control Modes

After a video session is reset to the initial state, the default behavior and parameters of video encode rate control are entirely implementation-dependent and the application cannot affect the bitrate or quality parameters of the encoded bitstream data produced by video encode operations unless the application changes the rate control configuration of the video session, as described in the Video Coding Control section.

For each supported video profile, the implementation may expose a set of rate control modes that are available for use by the application when encoding bitstreams targeting that video profile. These modes allow using different rate control algorithms that fall into one of the following two categories:

  1. Per-operation rate control
  2. Stream-level rate control

In case of per-operation rate control, the bitrate of the generated video bitstream data is indirectly controlled by quality, size, or other encoding parameters specified by the application for each individual video encode operation.

In case of stream-level rate control, the application can directly specify target bitrates besides other encoding parameters to control the behavior of the rate control algorithm used by the implementation across multiple video encode operations.

VkVideoEncodeRateControlModeFlagBitsKHRVideo encode rate control modes
VkVideoEncodeRateControlModeFlagsKHRBitmask of VkVideoEncodeRateControlModeFlagBitsKHR

Leaky Bucket Model

Video encoding implementations use the leaky bucket model for stream-level rate control. The leaky bucket is a concept referring to the interface between the video encoder and the consumer (for example, a network connection), where the video encoder produces encoded bitstream data corresponding to the encoded pictures and adds them in the leaky bucket while its content are drained by the consumer.

Analogously, a similar leaky bucket is considered to exist at the input interface of a video decoder, into which encoded bitstream data is continuously added and is subsequently consumed by the video decoder. It is desirable to avoid overflowing or underflowing this leaky bucked because:

  • In case of an underflow, the video decoder will be unable to consume encoded bitstream data in order to decode pictures (and optionally display them).
  • In case of an overflow, the leaky bucket will be unable to accommodate more encoded bitstream data and such data may need to be thrown away, leading to the loss of the corresponding encoded pictures.

These requirements can be satisfied by imposing various constraints on the encoder-side leaky bucket to avoid its overflow or underflow, depending on the used rate control algorithm and codec parameters. However, enumerating these constraints is outside the scope of this specification.

The term virtual buffer is often used as an alternative to refer to the leaky bucket.

This virtual buffer model is defined by the following parameters:

  • The bitrate (R) at which the encoded bitstream is expected to be processed.
  • The size (B) of the virtual buffer.
  • The initial occupancy (F) of the virtual buffer.

In this model the virtual buffer is used to smooth out fluctuations in the bitrate of the encoded bitstream over time without experiencing buffer overflow or underflow, as long as the bitrate of the encoded stream does not diverge from the target bitrate for extended periods of time.

This buffering may inherently impose a processing delay, as the goal of the model is to enable decoders maintain a consistent processing rate of an encoded bitstream with varying data rate.

The initial or start-up delay (D) is computed as:

D = F / R

Applications need to configure the virtual buffer with sufficient size to avoid or minimize buffer overflows and underflows while also keeping it small enough to meet their latency goals.

Rate Control Layers

Some video compression standards and video profiles allow associating encoded pictures with specific video coding layers. The name, identification, and semantics associated with such video coding layers are defined by the corresponding video compression standards.

Analogously, stream-level rate control can be configured to use one or more rate control layers:

  • When a single rate control layer is configured, it is applied to all encoded pictures, regardless of the picture’s video coding layer. In this case the distribution of the available bitrate budget across video coding layers is implementation-dependent.
  • When multiple rate control layers are configured, each rate control layer is applied to the corresponding video coding layer, i.e. only across encoded pictures pertaining to the corresponding video coding layer.

Individual rate control layers are identified using layer indices between zero and N-1, where N is the number of active rate control layers.

Rate control layers are only applicable when using stream-level rate control modes.

Rate Control State

Rate control state is maintained by the implementation in the video session objects and its parameters are specified using an instance of the VkVideoEncodeRateControlInfoKHR structure. The complete rate control state of a video session is defined by the following set of parameters:

Two rate control states match if all the parameters listed above match between them.

VkVideoEncodeRateControlInfoKHRStructure to set encode stream rate control parameters
VkVideoEncodeRateControlFlagsKHRReserved for future use

Rate Control Layer State

The configuration of individual rate control layers is specified using an instance of the VkVideoEncodeRateControlLayerInfoKHR structure.

VkVideoEncodeRateControlLayerInfoKHRStructure to set encode per-layer rate control parameters

H.264 Encode Operations

Video encode operations using an H.264 encode profile can be used to encode elementary video stream sequences compliant to the ITU-T H.264 Specification.

Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.

This process is performed according to the video encode operation steps with the codec-specific semantics defined in section 8 of the ITU-T H.264 Specification as follows:

If the parameters adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.264 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video encode operation will complete successfully. Otherwise, the video encode operation may complete unsuccessfully.

H.264 Encode Parameter Overrides

Implementations may override, unless otherwise specified, any of the H.264 encode parameters specified in the following Video Std structures:

  • StdVideoH264SequenceParameterSet
  • StdVideoH264PictureParameterSet
  • StdVideoEncodeH264PictureInfo
  • StdVideoEncodeH264SliceHeader
  • StdVideoEncodeH264ReferenceInfo

All such H.264 encode parameter overrides must fulfill the conditions defined in the Video Encode Parameter Overrides section.

In addition, implementations must not override any of the following H.264 encode parameters:

  • StdVideoEncodeH264PictureInfo::primary_pic_type
  • StdVideoEncodeH264SliceHeader::slice_type

In case of H.264 encode parameters stored in video session parameters objects, applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened. If the query indicates that implementation overrides were applied, then the application needs to retrieve and use the encoded H.264 parameter sets in the bitstream in order to be able to produce a compliant H.264 video bitstream using the H.264 encode parameters stored in the video session parameters object.

In case of any H.264 encode parameters stored in the encoded bitstream produced by video encode operations, if the implementation supports the VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR

video encode feedback query flag, the application can use such queries to retrieve feedback about whether any implementation overrides have been applied to those H.264 encode parameters.

H.264 Encode Bitstream Data Access

Each video encode operation writes one or more VCL NAL units comprising of slice headers and data of the encoded picture, in the format defined in sections 7.3.3 and 7.3.4, according to the semantics defined in sections 7.4.3 and 7.4.4 of the ITU-T H.264 Specification, respectively. The number of VCL NAL units written is specified by VkVideoEncodeH264PictureInfoKHR::naluSliceEntryCount.

In addition, if VkVideoEncodeH264PictureInfoKHR::generatePrefixNalu is set to VK_TRUE for the video encode operation, then an additional prefix NAL unit is written before each VCL NAL unit corresponding to individual slices in the format defined in section 7.3.2.12, according to the semantics defined in section 7.4.2.12 of the ITU-T H.264 Specification, respectively.

H.264 Encode Picture Data Access

Accesses to image data within a video picture resource happen at the granularity indicated by VkVideoCapabilitiesKHR::pictureAccessGranularity, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile. Accordingly, the complete image subregion of a encode input picture, reference picture, or reconstructed picture accessed by video coding operations using an H.264 encode profile is defined as the set of texels within the coordinate range:

([0,endX),[0,endY))

Where:

  • endX equals codedExtent.width rounded up to the nearest integer multiple of pictureAccessGranularity.width and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
  • endY equals codedExtent.height rounded up to the nearest integer multiple of pictureAccessGranularity.height and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;

Where codedExtent is the member of the VkVideoPictureResourceInfoKHR structure corresponding to the picture.

In case of video encode operations using an H.264 encode profile, any access to a picture at the coordinates (x,y), as defined by the ITU-T H.264 Specification, is an access to the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure at the texel coordinates (x,y).

Implementations may choose not to access some or all texels within particular reference pictures available to a video encode operation (e.g. due to video encode parameter overrides restricting the effective set of used reference pictures, or if the encoding algorithm chooses not to use certain subregions of the reference picture data for sample prediction).

H.264 Frame, Picture, and Slice

H.264 pictures are partitioned into slices, as defined in section 6.3 of the ITU-T H.264 Specification.

Video encode operations using an H.264 encode profile can encode slices of different types, as defined in section 7.4.3 of the ITU-T H.264 Specification, by specifying the corresponding enumeration constant value in StdVideoEncodeH264SliceHeader::slice_type in the H.264 slice header parameters from the Video Std enumeration type StdVideoH264SliceType:

  • STD_VIDEO_H264_SLICE_TYPE_P indicates that the slice is a P slice as defined in section 3.109 of the ITU-T H.264 Specification.
  • STD_VIDEO_H264_SLICE_TYPE_B indicates that the slice is a B slice as defined in section 3.9 of the ITU-T H.264 Specification.
  • STD_VIDEO_H264_SLICE_TYPE_I indicates that the slice is an I slice as defined in section 3.66 of the ITU-T H.264 Specification.

Pictures constructed from such slices can be of different types, as defined in section 7.4.2.4 of the ITU-T H.264 Specification. Video encode operations using an H.264 encode profile can encode pictures of a specific type by specifying the corresponding enumeration constant value in StdVideoEncodeH264PictureInfo::primary_pic_type in the H.264 picture information from the Video Std enumeration type StdVideoH264PictureType:

  • STD_VIDEO_H264_PICTURE_TYPE_P indicates that the picture is a P picture. A frame consisting of a P picture is also referred to as a P frame.
  • STD_VIDEO_H264_PICTURE_TYPE_B indicates that the picture is a B picture. A frame consisting of a B picture is also referred to as a B frame.
  • STD_VIDEO_H264_PICTURE_TYPE_I indicates that the picture is an I picture. A frame consisting of an I picture is also referred to as an I frame.
  • STD_VIDEO_H264_PICTURE_TYPE_IDR indicates that the picture is a special type of I picture called an IDR picture as defined in section 3.69 of the ITU-T H.264 Specification. A frame consisting of an IDR picture is also referred to as an IDR frame.

H.264 Encode Profile

VkVideoEncodeH264ProfileInfoKHRStructure specifying H.264 encode-specific video profile parameters

H.264 Encode Capabilities

VkVideoEncodeH264CapabilitiesKHRStructure describing H.264 encode capabilities
VkVideoEncodeH264CapabilityFlagBitsKHRH.264 encode capability flags
VkVideoEncodeH264CapabilityFlagsKHRBitmask of VkVideoEncodeH264CapabilityFlagBitsKHR
VkVideoEncodeH264StdFlagBitsKHRVideo encode H.264 syntax capability flags
VkVideoEncodeH264StdFlagsKHRBitmask of VkVideoEncodeH264StdFlagBitsKHR

H.264 Encode Quality Level Properties

VkVideoEncodeH264QualityLevelPropertiesKHRStructure describing the H.264 encode quality level properties

H.264 Encode Session

Additional parameters can be specified when creating a video session with an H.264 encode profile by including an instance of the VkVideoEncodeH264SessionCreateInfoKHR structure in the pNext chain of VkVideoSessionCreateInfoKHR.

VkVideoEncodeH264SessionCreateInfoKHRStructure specifies H.264 encode session parameters

H.264 Encode Parameter Sets

Video session parameters objects created with the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHRcan contain the following types of parameters:

H.264 Sequence Parameter Sets (SPS)

Represented by StdVideoH264SequenceParameterSet structures and interpreted as follows:

  • reserved1 and reserved2 are used only for padding purposes and are otherwise ignored;
  • seq_parameter_set_id is used as the key of the SPS entry;
  • level_idc is one of the enum constants STD_VIDEO_H264_LEVEL_IDC_<major>_<minor> identifying the H.264 level <major>.<minor> as defined in section A.3 of the ITU-T H.264 Specification;
  • if flags.seq_scaling_matrix_present_flag is set, then the StdVideoH264ScalingLists structure pointed to by pScalingLists is interpreted as follows:
    • scaling_list_present_mask is a bitmask where bit index i corresponds to seq_scaling_list_present_flag[i] as defined in section 7.4.2.1 of the ITU-T H.264 Specification;
    • use_default_scaling_matrix_mask is a bitmask where bit index i corresponds to UseDefaultScalingMatrix4x4Flag[i], when i < 6, or corresponds to UseDefaultScalingMatrix8x8Flag[i-6], otherwise, as defined in section 7.3.2.1 of the ITU-T H.264 Specification;
    • ScalingList4x4 and ScalingList8x8 correspond to the identically named syntax elements defined in section 7.3.2.1 of the ITU-T H.264 Specification;
  • if flags.vui_parameters_present_flag is set, then pSequenceParameterSetVui is a pointer to a StdVideoH264SequenceParameterSetVui structure that is interpreted as follows:
    • reserved1 is used only for padding purposes and is otherwise ignored;
    • if flags.nal_hrd_parameters_present_flag or flags.vcl_hrd_parameters_present_flag is set, then the StdVideoH264HrdParameters structure pointed to by pHrdParameters is interpreted as follows:
      • reserved1 is used only for padding purposes and is otherwise ignored;
      • all other members of StdVideoH264HrdParameters are interpreted as defined in section E.2.2 of the ITU-T H.264 Specification;
    • all other members of StdVideoH264SequenceParameterSetVui are interpreted as defined in section E.2.1 of the ITU-T H.264 Specification;
  • all other members of StdVideoH264SequenceParameterSet are interpreted as defined in section 7.4.2.1 of the ITU-T H.264 Specification.
H.264 Picture Parameter Sets (PPS)

Represented by StdVideoH264PictureParameterSet structures and interpreted as follows:

  • the pair constructed from seq_parameter_set_id and pic_parameter_set_id is used as the key of the PPS entry;
  • if flags.pic_scaling_matrix_present_flag is set, then the StdVideoH264ScalingLists structure pointed to by pScalingLists is interpreted as follows:
    • scaling_list_present_mask is a bitmask where bit index i corresponds to pic_scaling_list_present_flag[i] as defined in section 7.4.2.2 of the ITU-T H.264 Specification;
    • use_default_scaling_matrix_mask is a bitmask where bit index i corresponds to UseDefaultScalingMatrix4x4Flag[i], when i < 6, or corresponds to UseDefaultScalingMatrix8x8Flag[i-6], otherwise, as defined in section 7.3.2.2 of the ITU-T H.264 Specification;
    • ScalingList4x4 and ScalingList8x8 correspond to the identically named syntax elements defined in section 7.3.2.2 of the ITU-T H.264 Specification;
  • all other members of StdVideoH264PictureParameterSet are interpreted as defined in section 7.4.2.2 of the ITU-T H.264 Specification.

Implementations may override any of these parameters according to the semantics defined in the Video Encode Parameter Overrides section before storing the resulting H.264 parameter sets into the video session parameters object. Applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened and to retrieve the encoded H.264 parameter sets in order to be able to produce a compliant H.264 video bitstream.

Such H.264 parameter set overrides may also have cascading effects on the implementation overrides applied to the encoded bitstream produced by video encode operations. If the implementation supports the VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR

video encode feedback query flag, then the application can use such queries to retrieve feedback about whether any implementation overrides have been applied to the encoded bitstream.

VkVideoEncodeH264SessionParametersCreateInfoKHRStructure specifies H.264 encoder parameter set information
VkVideoEncodeH264SessionParametersAddInfoKHRStructure specifies H.264 encoder parameter set information
VkVideoEncodeH264SessionParametersGetInfoKHRStructure specifying parameters for retrieving encoded H.264 parameter set data
VkVideoEncodeH264SessionParametersFeedbackInfoKHRStructure providing feedback about the requested H.264 video session parameters

H.264 Encoding Parameters

VkVideoEncodeH264PictureInfoKHRStructure specifies H.264 encode frame parameters
VkVideoEncodeH264NaluSliceInfoKHRStructure specifies H.264 encode slice NALU parameters
VkVideoEncodeH264DpbSlotInfoKHRStructure specifies H.264 encode DPB picture information

H.264 Encode Rate Control

Group of Pictures

In case of H.264 encoding it is common practice to follow a regular pattern of different picture types in display order when encoding subsequent frames. This pattern is referred to as the group of pictures (GOP).

A regular GOP is defined by the following parameters:

  • The number of frames in the GOP;
  • The number of consecutive B frames between I and/or P frames in display order.

GOPs are further classified as open and closed GOPs.

Frame types in an open GOP follow each other in display order according to the following algorithm:

  1. The first frame is always an I frame.
  2. This is followed by a number of consecutive B frames, as defined above.
  3. If the number of frames in the GOP is not reached yet, then the next frame is a P frame and the algorithm continues from step 2.

In case of a closed GOP, an IDR frame is used at a certain period.

It is also typical for H.264 encoding to use specific reference picture usage patterns across the frames of the GOP. The two most common reference patterns used are as follows:

Flat Reference Pattern
  • Each P frame uses the last non-B frame, in display order, as reference.
  • Each B frame uses the last non-B frame, in display order, as its backward reference, and uses the next non-B frame, in display order, as its forward reference.
Dyadic Reference Pattern
  • Each P frame uses the last non-B frame, in display order, as reference.
  • The following algorithm is applied to the sequence of consecutive B frames between I and/or P frames in display order:
    1. The B frame in the middle of this sequence uses the frame preceding the sequence as its backward reference, and uses the frame following the sequence as its forward reference.
    2. The algorithm is executed recursively for the following frame sequences:
      • The B frames of the original sequence preceding the frame in the middle, if any.
      • The B frames of the original sequence following the frame in the middle, if any.

The application can provide guidance to the implementation’s rate control algorithm about the structure of the GOP used by the application. Any such guidance about the GOP and its structure does not mandate that specific GOP structure to be used by the application, as the picture type of individual encoded pictures is still application-controlled, however, any deviation from the provided guidance may result in undesired rate control behavior including, but not limited, to the implementation not being able to conform to the expected average or target bitrates, or other rate control parameters specified by the application.

When an H.264 encode session is used to encode multiple temporal layers, it is also common practice to follow a regular pattern for the H.264 temporal ID for the encoded pictures in display order when encoding subsequent frames. This pattern is referred to as the temporal GOP. The most common temporal layer pattern used is as follows:

Dyadic Temporal Layer Pattern
  • The number of frames in the temporal GOP is 2n-1, where n is the number of temporal layers.
  • The ith frame in the temporal GOP uses temporal ID t, if and only if the index of the least significant bit set in i equals n-t-1, except for the first frame, which is the only frame in the temporal GOP using temporal ID zero.
  • The ith frame in the temporal GOP uses the rth frame as reference, where r is calculated from i by clearing the least significant bit set in it, except for the first frame in the temporal GOP, which uses the first frame of the previous temporal GOP, if any, as reference.

Multi-layer rate control and multi-layer coding are typically used for streaming cases where low latency is expected, hence B pictures with forward prediction are usually not used.

VkVideoEncodeH264RateControlInfoKHRStructure describing H.264 stream rate control parameters
VkVideoEncodeH264RateControlFlagBitsKHRH.264 encode rate control bits
VkVideoEncodeH264RateControlFlagsKHRBitmask specifying H.264 encode rate control flags

Rate Control Layers

VkVideoEncodeH264RateControlLayerInfoKHRStructure describing H.264 per-layer rate control parameters
VkVideoEncodeH264QpKHRStructure describing H.264 QP values per picture type
VkVideoEncodeH264FrameSizeKHRStructure describing frame size values per H.264 picture type

GOP Remaining Frames

Besides session level rate control configuration, the application can specify the number of frames per frame type remaining in the group of pictures (GOP).

VkVideoEncodeH264GopRemainingFrameInfoKHRStructure specifying H.264 encode rate control GOP remaining frame counts

H.264 Encode Requirements

This section described the required H.264 encoding capabilities for physical devices that have at least one queue family that supports the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 in VkQueueFamilyVideoPropertiesKHR::videoCodecOperations.

H.265 Encode Operations

Video encode operations using an H.265 encode profile can be used to encode elementary video stream sequences compliant to the ITU-T H.265 Specification.

Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.

This process is performed according to the video encode operation steps with the codec-specific semantics defined in section 8 of the ITU-T H.265 Specification as follows:

If the parameters adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.265 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video encode operation will complete successfully. Otherwise, the video encode operation may complete unsuccessfully.

H.265 Encode Parameter Overrides

Implementations may override, unless otherwise specified, any of the H.265 encode parameters specified in the following Video Std structures:

  • StdVideoH265VideoParameterSet
  • StdVideoH265SequenceParameterSet
  • StdVideoH265PictureParameterSet
  • StdVideoEncodeH265PictureInfo
  • StdVideoEncodeH265SliceSegmentHeader
  • StdVideoEncodeH265ReferenceInfo

All such H.265 encode parameter overrides must fulfill the conditions defined in the Video Encode Parameter Overrides section.

In addition, implementations must not override any of the following H.265 encode parameters:

  • StdVideoEncodeH265PictureInfo::pic_type
  • StdVideoEncodeH265SliceSegmentHeader::slice_type

In case of H.265 encode parameters stored in video session parameters objects, applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened. If the query indicates that implementation overrides were applied, then the application needs to retrieve and use the encoded H.265 parameter sets in the bitstream in order to be able to produce a compliant H.265 video bitstream using the H.265 encode parameters stored in the video session parameters object.

In case of any H.265 encode parameters stored in the encoded bitstream produced by video encode operations, if the implementation supports the VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR

video encode feedback query flag, the application can use such queries to retrieve feedback about whether any implementation overrides have been applied to those H.265 encode parameters.

H.265 Encode Bitstream Data Access

Each video encode operation writes one or more VCL NAL units comprising of slice segment headers and data of the encoded picture, in the format defined in sections 7.3.6 and 7.3.8, according to the semantics defined in sections 7.4.7 and 7.4.9 of the ITU-T H.265 Specification, respectively. The number of VCL NAL units written is specified by VkVideoEncodeH265PictureInfoKHR::naluSliceSegmentEntryCount.

H.265 Encode Picture Data Access

Accesses to image data within a video picture resource happen at the granularity indicated by VkVideoCapabilitiesKHR::pictureAccessGranularity, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile. Accordingly, the complete image subregion of a encode input picture, reference picture, or reconstructed picture accessed by video coding operations using an H.265 encode profile is defined as the set of texels within the coordinate range:

([0,endX),[0,endY))

Where:

  • endX equals codedExtent.width rounded up to the nearest integer multiple of pictureAccessGranularity.width and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
  • endY equals codedExtent.height rounded up to the nearest integer multiple of pictureAccessGranularity.height and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;

Where codedExtent is the member of the VkVideoPictureResourceInfoKHR structure corresponding to the picture.

In case of video encode operations using an H.265 encode profile, any access to a picture at the coordinates (x,y), as defined by the ITU-T H.265 Specification, is an access to the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure at the texel coordinates (x,y).

Implementations may choose not to access some or all texels within particular reference pictures available to a video encode operation (e.g. due to video encode parameter overrides restricting the effective set of used reference pictures, or if the encoding algorithm chooses not to use certain subregions of the reference picture data for sample prediction).

H.265 Frame, Picture, Slice Segments, and Tiles

H.265 pictures consist of one or more slices, slice segments, and tiles, as defined in section 6.3.1 of the ITU-T H.265 Specification.

Video encode operations using an H.265 encode profile can encode slice segments of different types, as defined in section 7.4.7.1 of the ITU-T H.265 Specification, by specifying the corresponding enumeration constant value in StdVideoEncodeH265SliceSegmentHeader::slice_type in the H.265 slice segment header parameters from the Video Std enumeration type StdVideoH265SliceType:

  • STD_VIDEO_H265_SLICE_TYPE_B indicates that the slice segment is part of a B slice as defined in section 3.12 of the ITU-T H.265 Specification.
  • STD_VIDEO_H265_SLICE_TYPE_P indicates that the slice segment is part of a P slice as defined in section 3.111 of the ITU-T H.265 Specification.
  • STD_VIDEO_H265_SLICE_TYPE_I indicates that the slice segment is part of an I slice as defined in section 3.74 of the ITU-T H.265 Specification.

Pictures constructed from such slice segments can be of different types, as defined in section 7.4.3.5 of the ITU-T H.265 Specification. Video encode operations using an H.265 encode profile can encode pictures of a specific type by specifying the corresponding enumeration constant value in StdVideoEncodeH265PictureInfo::pic_type in the H.265 picture information from the Video Std enumeration type StdVideoH265PictureType:

  • STD_VIDEO_H265_PICTURE_TYPE_P indicates that the picture is a P picture. A frame consisting of a P picture is also referred to as a P frame.
  • STD_VIDEO_H265_PICTURE_TYPE_B indicates that the picture is a B picture. A frame consisting of a B picture is also referred to as a B frame.
  • STD_VIDEO_H265_PICTURE_TYPE_I indicates that the picture is an I picture. A frame consisting of an I picture is also referred to as an I frame.
  • STD_VIDEO_H265_PICTURE_TYPE_IDR indicates that the picture is a special type of I picture called an IDR picture as defined in section 3.67 of the ITU-T H.265 Specification. A frame consisting of an IDR picture is also referred to as an IDR frame.

H.265 Encode Profile

VkVideoEncodeH265ProfileInfoKHRStructure specifying H.265 encode-specific video profile parameters

H.265 Encode Capabilities

VkVideoEncodeH265CapabilitiesKHRStructure describing H.265 encode capabilities
VkVideoEncodeH265CapabilityFlagBitsKHRVideo encode H.265 capability flags
VkVideoEncodeH265CapabilityFlagsKHRBitmask of VkVideoEncodeH265CapabilityFlagBitsKHR
VkVideoEncodeH265StdFlagBitsKHRVideo encode H.265 syntax capability flags
VkVideoEncodeH265StdFlagsKHRBitmask of VkVideoEncodeH265StdFlagBitsKHR
VkVideoEncodeH265CtbSizeFlagBitsKHRSupported CTB sizes for H.265 video encode
VkVideoEncodeH265CtbSizeFlagsKHRBitmask of VkVideoEncodeH265CtbSizeFlagBitsKHR
VkVideoEncodeH265TransformBlockSizeFlagBitsKHRSupported transform block sizes for H.265 video encode
VkVideoEncodeH265TransformBlockSizeFlagsKHRBitmask of VkVideoEncodeH265TransformBlockSizeFlagBitsKHR

H.265 Encode Quality Level Properties

VkVideoEncodeH265QualityLevelPropertiesKHRStructure describing the H.265 encode quality level properties

H.265 Encode Session

Additional parameters can be specified when creating a video session with an H.265 encode profile by including an instance of the VkVideoEncodeH265SessionCreateInfoKHR structure in the pNext chain of VkVideoSessionCreateInfoKHR.

VkVideoEncodeH265SessionCreateInfoKHRStructure specifies H.265 encode session parameters

H.265 Encode Parameter Sets

Video session parameters objects created with the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHRcan contain the following types of parameters:

H.265 Video Parameter Sets (VPS)

Represented by StdVideoH265VideoParameterSet structures and interpreted as follows:

  • reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
  • vps_video_parameter_set_id is used as the key of the VPS entry;
  • the max_latency_increase_plus1, max_dec_pic_buffering_minus1, and max_num_reorder_pics members of the StdVideoH265DecPicBufMgr structure pointed to by pDecPicBufMgr correspond to vps_max_latency_increase_plus1, vps_max_dec_pic_buffering_minus1, and vps_max_num_reorder_pics, respectively, as defined in section 7.4.3.1 of the ITU-T H.265 Specification;
  • the StdVideoH265HrdParameters structure pointed to by pHrdParameters is interpreted as follows:
    • reserved is used only for padding purposes and is otherwise ignored;
    • flags.fixed_pic_rate_general_flag is a bitmask where bit index i corresponds to fixed_pic_rate_general_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
    • flags.fixed_pic_rate_within_cvs_flag is a bitmask where bit index i corresponds to fixed_pic_rate_within_cvs_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
    • flags.low_delay_hrd_flag is a bitmask where bit index i corresponds to low_delay_hrd_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
    • if flags.nal_hrd_parameters_present_flag is set, then pSubLayerHrdParametersNal is a pointer to an array of vps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where vps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265VideoParameterSet structure and each element is interpreted as follows:
      • cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
      • all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
    • if flags.vcl_hrd_parameters_present_flag is set, then pSubLayerHrdParametersVcl is a pointer to an array of vps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where vps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265VideoParameterSet structure and each element is interpreted as follows:
      • cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
      • all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
    • all other members of StdVideoH265HrdParameters are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
  • the StdVideoH265ProfileTierLevel structure pointed to by pProfileTierLevel are interpreted as follows:
    • general_level_idc is one of the enum constants STD_VIDEO_H265_LEVEL_IDC_<major>_<minor> identifying the H.265 level <major>.<minor> as defined in section A.4 of the ITU-T H.265 Specification;
    • all other members of StdVideoH265ProfileTierLevel are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
  • all other members of StdVideoH265VideoParameterSet are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.
H.265 Sequence Parameter Sets (SPS)

Represented by StdVideoH265SequenceParameterSet structures and interpreted as follows:

  • reserved1 and reserved2 are used only for padding purposes and are otherwise ignored;
  • the pair constructed from sps_video_parameter_set_id and sps_seq_parameter_set_id is used as the key of the SPS entry;
  • the StdVideoH265ProfileTierLevel structure pointed to by pProfileTierLevel are interpreted as follows:
    • general_level_idc is one of the enum constants STD_VIDEO_H265_LEVEL_IDC_<major>_<minor> identifying the H.265 level <major>.<minor> as defined in section A.4 of the ITU-T H.265 Specification;
    • all other members of StdVideoH265ProfileTierLevel are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
  • the max_latency_increase_plus1, max_dec_pic_buffering_minus1, and max_num_reorder_pics members of the StdVideoH265DecPicBufMgr structure pointed to by pDecPicBufMgr correspond to sps_max_latency_increase_plus1, sps_max_dec_pic_buffering_minus1, and sps_max_num_reorder_pics, respectively, as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
  • if flags.sps_scaling_list_data_present_flag is set, then the StdVideoH265ScalingLists structure pointed to by pScalingLists is interpreted as follows:
    • ScalingList4x4, ScalingList8x8, ScalingList16x16, and ScalingList32x32 correspond to ScalingList[0], ScalingList[1], ScalingList[2], and ScalingList[3], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
    • ScalingListDCCoef16x16 and ScalingListDCCoef32x32 correspond to scaling_list_dc_coef_minus8[0] and scaling_list_dc_coef_minus8[1], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
  • pShortTermRefPicSet is a pointer to an array of num_short_term_ref_pic_sets number of StdVideoH265ShortTermRefPicSet structures where each element is interpreted as follows:
    • reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
    • used_by_curr_pic_flag is a bitmask where bit index i corresponds to used_by_curr_pic_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
    • use_delta_flag is a bitmask where bit index i corresponds to use_delta_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
    • used_by_curr_pic_s0_flag is a bitmask where bit index i corresponds to used_by_curr_pic_s0_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
    • used_by_curr_pic_s1_flag is a bitmask where bit index i corresponds to used_by_curr_pic_s1_flag[i] as defined in section 7.4.8 of the ITU-T H.265 Specification;
    • all other members of StdVideoH265ShortTermRefPicSet are interpreted as defined in section 7.4.8 of the ITU-T H.265 Specification;
  • if flags.long_term_ref_pics_present_flag is set then the StdVideoH265LongTermRefPicsSps structure pointed to by pLongTermRefPicsSps is interpreted as follows:
    • used_by_curr_pic_lt_sps_flag is a bitmask where bit index i corresponds to used_by_curr_pic_lt_sps_flag[i] as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
    • all other members of StdVideoH265LongTermRefPicsSps are interpreted as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
  • if flags.vui_parameters_present_flag is set, then the StdVideoH265SequenceParameterSetVui structure pointed to by pSequenceParameterSetVui is interpreted as follows:
    • reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
    • the StdVideoH265HrdParameters structure pointed to by pHrdParameters is interpreted as follows:
      • flags.fixed_pic_rate_general_flag is a bitmask where bit index i corresponds to fixed_pic_rate_general_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
      • flags.fixed_pic_rate_within_cvs_flag is a bitmask where bit index i corresponds to fixed_pic_rate_within_cvs_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
      • flags.low_delay_hrd_flag is a bitmask where bit index i corresponds to low_delay_hrd_flag[i] as defined in section E.3.2 of the ITU-T H.265 Specification;
      • if flags.nal_hrd_parameters_present_flag is set, then pSubLayerHrdParametersNal is a pointer to an array of sps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where sps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265SequenceParameterSet structure and each element is interpreted as follows:
        • cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
        • all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
      • if flags.vcl_hrd_parameters_present_flag is set, then pSubLayerHrdParametersVcl is a pointer to an array of sps_max_sub_layers_minus1 + 1 number of StdVideoH265SubLayerHrdParameters structures where sps_max_sub_layers_minus1 is the corresponding member of the encompassing StdVideoH265SequenceParameterSet structure and each element is interpreted as follows:
        • cbr_flag is a bitmask where bit index i corresponds to cbr_flag[i] as defined in section E.3.3 of the ITU-T H.265 Specification;
        • all other members of the StdVideoH265SubLayerHrdParameters structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
      • all other members of StdVideoH265HrdParameters are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
    • all other members of pSequenceParameterSetVui are interpreted as defined in section E.3.1 of the ITU-T H.265 Specification;
  • if flags.sps_palette_predictor_initializer_present_flag is set, then the PredictorPaletteEntries member of the StdVideoH265PredictorPaletteEntries structure pointed to by pPredictorPaletteEntries is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification;
  • all other members of StdVideoH265SequenceParameterSet are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.
H.265 Picture Parameter Sets (PPS)

Represented by StdVideoH265PictureParameterSet structures and interpreted as follows:

  • reserved1, reserved2, and reserved3 are used only for padding purposes and are otherwise ignored;
  • the triplet constructed from sps_video_parameter_set_id, pps_seq_parameter_set_id, and pps_pic_parameter_set_id is used as the key of the PPS entry;
  • if flags.pps_scaling_list_data_present_flag is set, then the StdVideoH265ScalingLists structure pointed to by pScalingLists is interpreted as follows:
    • ScalingList4x4, ScalingList8x8, ScalingList16x16, and ScalingList32x32 correspond to ScalingList[0], ScalingList[1], ScalingList[2], and ScalingList[3], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
    • ScalingListDCCoef16x16 and ScalingListDCCoef32x32 correspond to scaling_list_dc_coef_minus8[0] and scaling_list_dc_coef_minus8[1], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
  • if flags.pps_palette_predictor_initializer_present_flag is set, then the PredictorPaletteEntries member of the StdVideoH265PredictorPaletteEntries structure pointed to by pPredictorPaletteEntries is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification;
  • all other members of StdVideoH265PictureParameterSet are interpreted as defined in section 7.4.3.3 of the ITU-T H.265 Specification.

Implementations may override any of these parameters according to the semantics defined in the Video Encode Parameter Overrides section before storing the resulting H.265 parameter sets into the video session parameters object. Applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened and to retrieve the encoded H.265 parameter sets in order to be able to produce a compliant H.265 video bitstream.

Such H.265 parameter set overrides may also have cascading effects on the implementation overrides applied to the encoded bitstream produced by video encode operations. If the implementation supports the VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR

video encode feedback query flag, then the application can use such queries to retrieve feedback about whether any implementation overrides have been applied to the encoded bitstream.

VkVideoEncodeH265SessionParametersCreateInfoKHRStructure specifies H.265 encoder parameter set information
VkVideoEncodeH265SessionParametersAddInfoKHRStructure specifies H.265 encoder parameter set information
VkVideoEncodeH265SessionParametersGetInfoKHRStructure specifying parameters for retrieving encoded H.265 parameter set data
VkVideoEncodeH265SessionParametersFeedbackInfoKHRStructure providing feedback about the requested H.265 video session parameters

H.265 Encoding Parameters

VkVideoEncodeH265PictureInfoKHRStructure specifies H.265 encode frame parameters
VkVideoEncodeH265NaluSliceSegmentInfoKHRStructure specifies H.265 encode slice segment NALU parameters
VkVideoEncodeH265DpbSlotInfoKHRStructure specifies H.265 encode DPB picture information

H.265 Encode Rate Control

Group of Pictures

In case of H.265 encoding it is common practice to follow a regular pattern of different picture types in display order when encoding subsequent frames. This pattern is referred to as the group of pictures (GOP).

A regular GOP is defined by the following parameters:

  • The number of frames in the GOP;
  • The number of consecutive B frames between I and/or P frames in display order.

GOPs are further classified as open and closed GOPs.

Frame types in an open GOP follow each other in display order according to the following algorithm:

  1. The first frame is always an I frame.
  2. This is followed by a number of consecutive B frames, as defined above.
  3. If the number of frames in the GOP is not reached yet, then the next frame is a P frame and the algorithm continues from step 2.

In case of a closed GOP, an IDR frame is used at a certain period.

It is also typical for H.265 encoding to use specific reference picture usage patterns across the frames of the GOP. The two most common reference patterns used are as follows:

Flat Reference Pattern
  • Each P frame uses the last non-B frame, in display order, as reference.
  • Each B frame uses the last non-B frame, in display order, as its backward reference, and uses the next non-B frame, in display order, as its forward reference.
Dyadic Reference Pattern
  • Each P frame uses the last non-B frame, in display order, as reference.
  • The following algorithm is applied to the sequence of consecutive B frames between I and/or P frames in display order:
    1. The B frame in the middle of this sequence uses the frame preceding the sequence as its backward reference, and uses the frame following the sequence as its forward reference.
    2. The algorithm is executed recursively for the following frame sequences:
      • The B frames of the original sequence preceding the frame in the middle, if any.
      • The B frames of the original sequence following the frame in the middle, if any.

The application can provide guidance to the implementation’s rate control algorithm about the structure of the GOP used by the application. Any such guidance about the GOP and its structure does not mandate that specific GOP structure to be used by the application, as the picture type of individual encoded pictures is still application-controlled, however, any deviation from the provided guidance may result in undesired rate control behavior including, but not limited, to the implementation not being able to conform to the expected average or target bitrates, or other rate control parameters specified by the application.

When an H.265 encode session is used to encode multiple temporal sub-layers, it is also common practice to follow a regular pattern for the H.265 temporal ID for the encoded pictures in display order when encoding subsequent frames. This pattern is referred to as the temporal GOP. The most common temporal layer pattern used is as follows:

Dyadic Temporal Sub-Layer Pattern
  • The number of frames in the temporal GOP is 2n-1, where n is the number of temporal sub-layers.
  • The ith frame in the temporal GOP uses temporal ID t, if and only if the index of the least significant bit set in i equals n-t-1, except for the first frame, which is the only frame in the temporal GOP using temporal ID zero.
  • The ith frame in the temporal GOP uses the rth frame as reference, where r is calculated from i by clearing the least significant bit set in it, except for the first frame in the temporal GOP, which uses the first frame of the previous temporal GOP, if any, as reference.

Multi-layer rate control and multi-layer coding are typically used for streaming cases where low latency is expected, hence B pictures with forward prediction are usually not used.

VkVideoEncodeH265RateControlInfoKHRStructure describing H.265 stream rate control parameters
VkVideoEncodeH265RateControlFlagBitsKHRH.265 encode rate control bits
VkVideoEncodeH265RateControlFlagsKHRBitmask specifying H.265 encode rate control flags

Rate Control Layers

VkVideoEncodeH265RateControlLayerInfoKHRStructure describing H.265 per-layer rate control parameters
VkVideoEncodeH265QpKHRStructure describing H.265 QP values per picture type
VkVideoEncodeH265FrameSizeKHRStructure describing frame size values per H.265 picture type

GOP Remaining Frames

Besides session level rate control configuration, the application can specify the number of frames per frame type remaining in the group of pictures (GOP).

VkVideoEncodeH265GopRemainingFrameInfoKHRStructure specifying H.265 encode rate control GOP remaining frame counts

H.265 Encode Requirements

This section described the required H.265 encoding capabilities for physical devices that have at least one queue family that supports the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 in VkQueueFamilyVideoPropertiesKHR::videoCodecOperations.