Video Coding
Vulkan implementations may expose one or more queue families supporting video coding operations. These operations are performed by recording them into a command buffer within a video coding scope, and submitting them to queues with compatible video coding capabilities.
The Vulkan video functionalities are designed to be made available through a set of APIs built on top of each other, consisting of:
- A core API providing common video coding functionalities,
- APIs providing codec-independent video decode and video encode related functionalities, respectively,
- Additional codec-specific APIs built on top of those.
This chapter details the fundamental components and operations of these.
Video Picture Resources
In the context of video coding, multidimensional arrays of image data that can be used as the source or target of video coding operations are referred to as video picture resources. They may store additional metadata that includes implementation-private information used during the execution of video coding operations, as discussed later.
Video picture resources are backed by VkImage objects. Individual subregions of VkImageView objects created from such resources can be used as decode output pictures, encode input pictures, reconstructed pictures, and/or reference pictures.
The parameters of a video picture resource are specified using a
VkVideoPictureResourceInfoKHR
structure.
Decoded Picture Buffer
An integral part of video coding pipelines is the reconstruction of pictures from a compressed video bitstream. A reconstructed picture is a video picture resource resulting from this process.
Such reconstructed pictures can be used as reference pictures in subsequent video coding operations to provide predictions of the values of samples of subsequently decoded or encoded pictures. The correct use of such reconstructed pictures as reference pictures is driven by the video compression standard, the implementation, and the application-specific use cases.
The list of reference pictures used to provide such predictions within a single video coding operation is referred to as the list of active reference pictures.
The decoded picture buffer (DPB) is an indexed data structure that maintains the set of reference pictures available to be used in video coding operations.
Individual indexed entries of the DPB are referred to as the decoded picture buffer (DPB) slots. The range of valid DPB slot indices is between zero andN-1
, where N
is the capacity of the DPB.
Each DPB slot can refer to a reference picture containing a video frame
or can refer to up to two reference pictures containing the top and/or
bottom fields that, when both present, together represent a full video
frame
.In Vulkan, the state and the backing store of the DPB is separated as follows:
- The state of individual DPB slots is maintained by video session objects.
- The backing store of DPB slots is provided by subregions of VkImage objects used as video picture resources.
In addition, the implementation may also maintain opaque metadata associated with DPB slots, including:
- Reference picture metadata corresponding to the video picture resource associated with the DPB slot.
Such metadata may be stored by the implementation as part of the DPB slot state maintained by the video session, or as part of the video picture resource backing the DPB slot.
Any metadata stored in the video picture resources backing DPB slots are independent of the video session used to store it, hence such video picture resources can be shared with other video sessions. Correspondingly, any metadata that is dependent on the video session will always be stored as part of the DPB slot state maintained by that video session.
The responsibility of managing the DPB is split between the application and the implementation as follows:
- The application maintains the association between DPB slot indices and corresponding video picture resources.
- The implementation maintains global and per-slot opaque reference picture metadata.
In addition, the application is also responsible for managing the mapping between the codec-specific picture IDs and DPB slots, and any other codec-specific states unless otherwise specified.
DPB Slot States
At a given time, each DPB slot is either in active or inactive state. Initially, all DPB slots managed by a video session are in inactive state.
A DPB slot can be activated by using it as the target of picture reconstruction in a video coding operation with the reconstructed picture requested to be set up as a reference picture, according to the codec-specific semantics, changing its state to active and associating it with a picture reference to the reconstructed pictures.
Some video coding standards allow multiple picture references to be associated with a single DPB slot. In this case the state of the individual picture references can be independently updated.
As an example, H.264 decoding allows associating a separate top field and bottom field picture with the same DPB slot.
As part of reference picture setup, the implementation may also generate reference picture metadata. Such reference picture metadata is specific to each picture reference associated with the DPB slot.
If such a video coding operation completes successfully, the activated DPB slot will have a valid picture reference and the reconstructed picture is associated with the DPB slot. This is true even if the DPB slot is used as the target of a picture reconstruction that only sets up a top field or bottom field reference picture and thus does not yet refer to a complete frame. However, if any data provided as input to such a video coding operation is not compliant with the video compression standard used, that video coding operation may complete unsuccessfully, in which case the activated DPB slot will have an invalid picture reference. This is true even if the DPB slot previously had a valid picture reference to a top field or bottom field reference picture, but the reconstruction of the other field corresponding to the DPB slot failed.
The application can use queries to get feedback about the outcome of video coding operations and use the resulting VkQueryResultStatusKHR value to determine whether the video coding operation completed successfully (result status is positive) or unsuccessfully (result status is negative).
Using a reference picture associated with a DPB slot that has an invalid picture reference as an active reference picture in subsequent video coding operations is legal, however, the contents of the outputs of such operations are undefined:, and any DPB slots activated by such video coding operations will also have an invalid picture reference. This is true even if such video coding operations may otherwise complete successfully.
A DPB slot can also be deactivated by the application, changing its state to inactive and invalidating any picture references and reference picture metadata associated with the DPB slot.
If an already active DPB slot is used as the target of picture reconstruction in a video coding operation, but the decoded picture is not requested to be set up as a reference picture, according to the codec-specific semantics, no reference picture setup happens and the corresponding picture reference and reference picture metadata is invalidated within the DPB slot. If the DPB slot no longer has any associated picture references after such an operation, the DPB slot is implicitly deactivated.
If an already active DPB slot is used as the target of picture reconstruction when decoding a field picture that is not marked as reference, then the behavior is as follows:
- If the DPB slot is currently associated with a frame, then the DPB slot is deactivated.
- If the DPB slot is not currently associated with a top field picture and the decoded picture is a top field picture, or if the DPB slot is not currently associated with a bottom field picture and the decoded picture is a bottom field picture, then the other field picture association of the DPB slot, if any, is not disturbed.
- If the DPB slot is currently associated with a top field picture and the decoded picture is a top field picture, or if the DPB slot is currently associated with a bottom field picture and the decoded picture is a bottom field picture, then that picture association is invalidated, without disturbing the other field picture association, if any. If the DPB slot no longer has any associated picture references after such an operation, the DPB slot is implicitly deactivated.
A DPB slot can be activated with a new frame even if it is already active. In this case all previous associations of the DPB slots with reference pictures are replaced with an association with the reconstructed picture used to activate it.
If an already active DPB slot is activated with a reconstructed field picture, then the behavior is as follows:
- If the DPB slot is currently associated with a frame, then that association is replaced with an association with the reconstructed field picture used to activate it.
- If the DPB slot is not currently associated with a top field picture and the DPB slot is activated with a top field picture, or if the DPB slot is not currently associated with a bottom field picture and the DPB slot is activated with a bottom field picture, then the DPB slot is associated with the reconstructed field picture used to activate it, without disturbing the other field picture association, if any.
- If the DPB slot is currently associated with a top field picture and the DPB slot is activated with a new top field picture, or if the DPB slot is currently associated with a bottom field picture and the DPB slot is activated with a new bottom field picture, then that association is replaced with an association with the reconstructed field picture used to activate it, without disturbing the other field picture association, if any.
Video Profiles
Chroma subsampling is described in more detail in the Chroma Reconstruction section.
Video Capabilities
Video Coding Capabilities
Video Format Capabilities
Video Sessions
Creating a Video Session
Destroying a Video Session
Video Session Memory Association
After creating a video session object, and before the object can be used to record video coding operations into command buffers using it, the application must allocate and bind device memory to the video session. Device memory is allocated separately (see Device Memory) and then associated with the video session.
Video sessions may have multiple memory bindings identified by unique unsigned integer values. Appropriate device memory must be bound to each such memory binding before using the video session to record command buffer commands with it.
Video Profile Compatibility
Resources and query pools used with a particular video session must be compatible with the video profile the video session was created with.
A VkBuffer is compatible with a video profile if it was created with
the VkBufferCreateInfo::pNext
chain including a
VkVideoProfileListInfoKHR structure with its pProfiles
member
containing an element matching the VkVideoProfileInfoKHR structure
chain describing the video profile, and
VkBufferCreateInfo::usage
including at least one bit specific to
video coding usage.
VK_BUFFER_USAGE_VIDEO_DECODE_SRC_BIT_KHR
VK_BUFFER_USAGE_VIDEO_DECODE_DST_BIT_KHR
VK_BUFFER_USAGE_VIDEO_ENCODE_SRC_BIT_KHR
VK_BUFFER_USAGE_VIDEO_ENCODE_DST_BIT_KHR
A VkBuffer is also compatible with a video profile if it was created
with VkBufferCreateInfo::flags
including
VK_BUFFER_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR
.
A VkImage is compatible with a video profile if it was created with
the VkImageCreateInfo::pNext
chain including a
VkVideoProfileListInfoKHR structure with its pProfiles
member
containing an element matching the VkVideoProfileInfoKHR structure
chain describing the video profile, and VkImageCreateInfo::usage
including at least one bit specific to video coding usage.
VK_IMAGE_USAGE_VIDEO_DECODE_SRC_BIT_KHR
VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
VK_IMAGE_USAGE_VIDEO_ENCODE_SRC_BIT_KHR
VK_IMAGE_USAGE_VIDEO_ENCODE_DST_BIT_KHR
VK_IMAGE_USAGE_VIDEO_ENCODE_DPB_BIT_KHR
A VkImage is also compatible with a video profile if all of the following conditions are true for the VkImageCreateInfo structure the image was created with:
- VkImageCreateInfo::
flags
includedVK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR
. - The list of VkVideoFormatPropertiesKHR structures, obtained by
calling vkGetPhysicalDeviceVideoFormatPropertiesKHR with
VkPhysicalDeviceVideoFormatInfoKHR::
imageUsage
equal to the VkImageCreateInfo::usage
the image was created with and the VkPhysicalDeviceVideoFormatInfoKHR::pNext
chain including a VkVideoProfileListInfoKHR structure with itspProfiles
member containing a single array element specifying the VkVideoProfileInfoKHR structure chain describing the video profile in question, contains an element for which all of the following conditions are true with respect to the VkImageCreateInfo structure the image was created with:- VkImageCreateInfo::
format
equals VkVideoFormatPropertiesKHR::format
. - VkImageCreateInfo::
flags
only contains bits also set in VkVideoFormatPropertiesKHR::imageCreateFlags
. - VkImageCreateInfo::
imageType
equals VkVideoFormatPropertiesKHR::imageType
. - VkImageCreateInfo::
tiling
equals VkVideoFormatPropertiesKHR::imageTiling
. - VkImageCreateInfo::
usage
only contains bits also set in VkVideoFormatPropertiesKHR::imageUsageFlags
.
- VkImageCreateInfo::
While some of these rules allow creating buffer or image resources that may
be compatible with any video profile, applications should still prefer to
include the specific video profiles the buffer or image resource is expected
to be used with (through a VkVideoProfileListInfoKHR structure
included in the pNext
chain of the corresponding create info
structure) whenever the information about the complete set of video profiles
is available at resource creation time, to enable the implementation to
optimize the created resource for the specific use case.
In the absence of that information, the implementation may have to make
conservative decisions about the memory requirements or representation of
the resource.
A VkImageView is compatible with a video profile if the VkImage it was created from is also compatible with that video profile.
A VkQueryPool is compatible with a video profile if it was created
with the VkQueryPoolCreateInfo::pNext
chain including a
VkVideoProfileInfoKHR structure chain describing the same video
profile, and VkQueryPoolCreateInfo::queryType
having one of the
following values:
VK_QUERY_TYPE_RESULT_STATUS_ONLY_KHR
VK_QUERY_TYPE_VIDEO_ENCODE_FEEDBACK_KHR
Video Session Parameters
Video session parameters objects can store preprocessed codec-specific parameters used with a compatible video session, and enable reducing the number of parameters needed to be provided and processed by the implementation while recording video coding operations into command buffers.
Parameters stored in such objects are immutable to facilitate the concurrent use of the stored parameters in multiple threads. At the same time, new parameters can be added to existing objects using the vkUpdateVideoSessionParametersKHR command.
In order to support concurrent use of the stored immutable parameters while
also allowing the video session parameters object to be extended with new
parameters, each video session parameters object maintains an update
sequence counter that is set to 0
at object creation time and must be
incremented by each subsequent update operation.
Certain video sequences that adhere to particular video compression standards permit updating previously supplied parameters. If a parameter update is necessary, the application has the following options:
- Cache the set of parameters on the application side and create a new video session parameters object adding all the parameters with appropriate changes, as necessary; or
- Create a new video session parameters object providing only the updated parameters and the previously used object as the template, which ensures that parameters not specified at creation time will be copied unmodified from the template object.
The actual types of parameters that can be stored and the capacity for individual parameter types, and the methods of initializing, updating, and referring to individual parameters are specific to the video codec operation the video session parameters object was created with.
- For
VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
these are defined in the H.264 Decode Parameter Sets section. - For
VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
these are defined in the H.265 Decode Parameter Sets section. - For
VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR
these are defined in the AV1 Decode Parameter Sets section. - For
VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
these are defined in the H.264 Encode Parameter Sets section. - For
VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
these are defined in the H.265 Encode Parameter Sets section.
Video session parameters objects created with an encode operation are further specialized based on the video encode quality level the video session parameters are used with, as implementations may apply different sets of parameter overrides depending on the used quality level. This enables implementations to store the potentially optimized set of parameters in these objects, further limiting the necessary processing required while recording video encode operations into command buffers.
Creating Video Session Parameters
Destroying Video Session Parameters
Updating Video Session Parameters
Video Coding Scope
Applications can record video coding commands for a video session only within a video coding scope.
Video Coding Control
Inline Queries
If a video session was created with
VK_VIDEO_SESSION_CREATE_INLINE_QUERIES_BIT_KHR
, beginning queries
using commands such as vkCmdBeginQuery within a video coding scope is
not allowed.
Instead, queries are executed inline by including an instance of the
VkVideoInlineQueryInfoKHR structure in the pNext
chain of the
parameters of one of the video coding commands, with its queryPool
member set to a valid VkQueryPool
handle.
Video Decode Operations
Video decode operations consume compressed video data from a video bitstream buffer and zero or more reference pictures, and produce a decode output picture and an optional reconstructed picture.
Such decode output pictures can be shared with the Decoded Picture Buffer, and can also be used as the input of video encode operations, with graphics or compute operations, or with Window System Integration APIs, depending on the capabilities of the implementation.
Video decode operations may access the following resources in the
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR
stage:
- The source video bitstream buffer range and the image subregions
corresponding to the list of active reference pictures with access
VK_ACCESS_2_VIDEO_DECODE_READ_BIT_KHR
. - The image subregions corresponding to the target
decode output picture and
reconstructed picture with access
VK_ACCESS_2_VIDEO_DECODE_WRITE_BIT_KHR
.
The image subresource of each video picture resource accessed by the video decode operation is specified using a corresponding VkVideoPictureResourceInfoKHR structure. Each such image subresource must be in the appropriate image layout as follows:
- If the image subresource is used in the video decode operation only as
decode output picture, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR
layout. - If the image subresource is used in the video decode operation both as
decode output picture and
reconstructed picture, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR
layout. - If the image subresource is used in the video decode operation only as
reconstructed picture, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR
layout. - If the image subresource is used in the video decode operation as a
reference picture, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR
layout.
A video decode operation may complete unsuccessfully. In this case the decode output picture will have undefined: contents. Similarly, if reference picture setup is requested, the reconstructed picture will also have undefined: contents, and the activated DPB slot will have an invalid picture reference.
Codec-Specific Semantics
The following aspects of video decode operations are codec-specific:
- The interpretation of the contents of the source video bitstream buffer range.
- The construction and interpretation of the list of active reference pictures and the interpretation of the picture data referred to by the corresponding image subregions.
- The construction and interpretation of information related to the decode output picture and the generation of picture data to the corresponding image subregion.
- The decision on reference picture setup.
- The construction and interpretation of information related to the optional reconstructed picture and the generation of picture data to the corresponding image subregion.
These codec-specific behaviors are defined for each video codec operation separately.
- If the used video codec operation is
VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
, then the codec-specific aspects of the video decoding process are performed as defined in the H.264 Decode Operations section. - If the used video codec operation is
VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
, then the codec-specific aspects of the video decoding process are performed as defined in the H.265 Decode Operations section. - If the used video codec operation is
VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR
, then the codec-specific aspects of the video decoding process are performed as defined in the AV1 Decode Operations section.
Video Decode Operation Steps
Each video decode operation performs the following steps in the
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR
stage:
- Reads the encoded video data from the source video bitstream buffer range.
- Performs picture reconstruction of the encoded video data according to the codec-specific semantics, applying any prediction data read from the active reference pictures in the process;
- Writes the decoded picture data to the decode output picture, and optionally to the reconstructed picture, if one is specified and is different from the decode output picture, according to the codec-specific semantics;
- If reference picture setup is requested, the DPB slot index specified in the reconstructed picture information is activated with the reconstructed picture.
When reconstructed picture information is provided, the specified DPB slot index is associated with the corresponding bound reference picture resource, indifferent of whether reference picture setup is requested.
Capabilities
Video Decode Commands
H.264 Decode Operations
Video decode operations using an H.264 decode profile can be used to decode elementary video stream sequences compliant to the ITU-T H.264 Specification.
Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.
This process is performed according to the video decode operation steps with the codec-specific semantics defined in section 8 of the ITU-T H.264 Specification as follows:
- Syntax elements, derived values, and other parameters are applied from
the following structures:
- The
StdVideoH264SequenceParameterSet
structure corresponding to the active SPS specifying the H.264 sequence parameter set. - The
StdVideoH264PictureParameterSet
structure corresponding to the active PPS specifying the H.264 picture parameter set. - The
StdVideoDecodeH264PictureInfo
structure specifying the H.264 picture information. - The
StdVideoDecodeH264ReferenceInfo
structures specifying the H.264 reference information corresponding to the optional reconstructed picture and any active reference pictures.
- The
- The contents of the provided video bitstream buffer range are interpreted as defined in the H.264 Decode Bitstream Data Access section.
- Picture data in the video picture resources corresponding to the used active reference pictures, decode output picture, and optional reconstructed picture is accessed as defined in the H.264 Decode Picture Data Access section.
- The decision on reference picture setup is made according to the parameters specified in the H.264 picture information.
If the parameters and the bitstream adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.264 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video decode operation will complete successfully. Otherwise, the video decode operation may complete unsuccessfully.
H.264 Decode Bitstream Data Access
If the target decode output picture is a frame, then the video bitstream buffer range should contain a VCL NAL unit comprised of the slice headers and data of a picture representing an entire frame, as defined in sections 7.3.3 and 7.3.4, and this data is interpreted as defined in sections 7.4.3 and 7.4.4 of the ITU-T H.264 Specification, respectively.
If the target decode output picture is a field, then the video bitstream buffer range should contain a VCL NAL unit comprised of the slice headers and data of a picture representing a field, as defined in sections 7.3.3 and 7.3.4, and this data is interpreted as defined in sections 7.4.3 and 7.4.4 of the ITU-T H.264 Specification, respectively.
The offsets provided in
VkVideoDecodeH264PictureInfoKHR::pSliceOffsets
should specify
the starting offsets corresponding to each slice header within the video
bitstream buffer range.
H.264 Decode Picture Data Access
The effective imageOffset
and imageExtent
corresponding to a
decode output picture,
reference picture, or
reconstructed picture used in video decode
operations with an H.264 decode profile are defined
as follows:
imageOffset
is (codedOffset.x
,codedOffset.y
) andimageExtent
is (codedExtent.width
,codedExtent.height
), if the picture represents a frame.imageOffset
is (codedOffset.x
,codedOffset.y
) andimageExtent
is (codedExtent.width
,codedExtent.height
), if the picture represents a field and the picture layout of the used H.264 decode profile isVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR
.imageOffset
is (codedOffset.x
,codedOffset.y
) andimageExtent
is (codedExtent.width
,codedExtent.height
/ 2), if the picture represents a field and the picture layout of the used H.264 decode profile isVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR
.
Where codedOffset
and codedExtent
are the members of the
VkVideoPictureResourceInfoKHR structure corresponding to the picture.
However, accesses to image data within a video picture resource happen at
the granularity indicated by
VkVideoCapabilitiesKHR::pictureAccessGranularity
, as returned by
vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile.
This means that the complete image subregion accessed by video coding
operations using an H.264 decode profile for the
video picture resource is defined as the set of texels within the coordinate
range:
- ([
startX
,endX
), [startY
,endY
))
Where:
startX
equalsimageOffset.x
rounded down to the nearest integer multiple ofpictureAccessGranularity.width
;endX
equalsimageOffset.x
+imageExtent.width
rounded up to the nearest integer multiple ofpictureAccessGranularity.width
and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;- startY equals
imageOffset.y
rounded down to the nearest integer multiple ofpictureAccessGranularity.height
; - endY equals
imageOffset.y
+imageExtent.height
rounded up to the nearest integer multiple ofpictureAccessGranularity.height
and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure.
In case of video decode operations using an H.264
decode profile, any access to a picture at the coordinates
(x
,y
), as defined by the ITU-T H.264
Specification, is an access to the image subresource
referred to by the corresponding
VkVideoPictureResourceInfoKHR structure at the texel coordinates
specified below:
- (
x
,y
), if the accessed picture represents a frame. - (
x
,y
× 2), if the accessed picture represents a top field and the picture layout of the used H.264 decode profile isVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR
. - (
x
,y
× 2 + 1), if the accessed picture represents a bottom field and the picture layout of the used H.264 decode profile isVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR
. - (
x
,y
), if the accessed picture represents a top field and the picture layout of the used H.264 decode profile isVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR
. - (
codedOffset.x
+x
,codedOffset.y
+y
), if the accessed picture represents a bottom field and the picture layout of the used H.264 decode profile isVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR
.
Where codedOffset
is the member of the corresponding
VkVideoPictureResourceInfoKHR structure.
H.264 Decode Profile
H.264 Decode Capabilities
H.264 Decode Parameter Sets
Video session parameters objects created with
the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
can contain the following types of parameters:
H.264 Sequence Parameter Sets (SPS)
Represented by StdVideoH264SequenceParameterSet
structures and
interpreted as follows:
reserved1
andreserved2
are used only for padding purposes and are otherwise ignored;seq_parameter_set_id
is used as the key of the SPS entry;level_idc
is one of the enum constantsSTD_VIDEO_H264_LEVEL_IDC_<major>_<minor>
identifying the H.264 level<major>.<minor>
as defined in section A.3 of the ITU-T H.264 Specification;- if
flags.seq_scaling_matrix_present_flag
is set, then theStdVideoH264ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:scaling_list_present_mask
is a bitmask where bit index i corresponds toseq_scaling_list_present_flag[i]
as defined in section 7.4.2.1 of the ITU-T H.264 Specification;use_default_scaling_matrix_mask
is a bitmask where bit index i corresponds toUseDefaultScalingMatrix4x4Flag[i]
, when i < 6, or corresponds toUseDefaultScalingMatrix8x8Flag[i-6]
, otherwise, as defined in section 7.3.2.1 of the ITU-T H.264 Specification;ScalingList4x4
andScalingList8x8
correspond to the identically named syntax elements defined in section 7.3.2.1 of the ITU-T H.264 Specification;
- if
flags.vui_parameters_present_flag
is set, thenpSequenceParameterSetVui
is a pointer to aStdVideoH264SequenceParameterSetVui
structure that is interpreted as follows:reserved1
is used only for padding purposes and is otherwise ignored;- if
flags.nal_hrd_parameters_present_flag
orflags.vcl_hrd_parameters_present_flag
is set, then theStdVideoH264HrdParameters
structure pointed to bypHrdParameters
is interpreted as follows:reserved1
is used only for padding purposes and is otherwise ignored;- all other members of
StdVideoH264HrdParameters
are interpreted as defined in section E.2.2 of the ITU-T H.264 Specification;
- all other members of
StdVideoH264SequenceParameterSetVui
are interpreted as defined in section E.2.1 of the ITU-T H.264 Specification;
- all other members of
StdVideoH264SequenceParameterSet
are interpreted as defined in section 7.4.2.1 of the ITU-T H.264 Specification.
H.264 Picture Parameter Sets (PPS)
Represented by StdVideoH264PictureParameterSet
structures and
interpreted as follows:
- the pair constructed from
seq_parameter_set_id
andpic_parameter_set_id
is used as the key of the PPS entry; - if
flags.pic_scaling_matrix_present_flag
is set, then theStdVideoH264ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:scaling_list_present_mask
is a bitmask where bit index i corresponds topic_scaling_list_present_flag[i]
as defined in section 7.4.2.2 of the ITU-T H.264 Specification;use_default_scaling_matrix_mask
is a bitmask where bit index i corresponds toUseDefaultScalingMatrix4x4Flag[i]
, when i < 6, or corresponds toUseDefaultScalingMatrix8x8Flag[i-6]
, otherwise, as defined in section 7.3.2.2 of the ITU-T H.264 Specification;ScalingList4x4
andScalingList8x8
correspond to the identically named syntax elements defined in section 7.3.2.2 of the ITU-T H.264 Specification;
- all other members of
StdVideoH264PictureParameterSet
are interpreted as defined in section 7.4.2.2 of the ITU-T H.264 Specification.
H.264 Decoding Parameters
H.264 Decode Requirements
This section describes the required H.264 decoding capabilities for
physical devices that have at least one queue family that supports the video
codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
, as
returned by vkGetPhysicalDeviceQueueFamilyProperties2 in
VkQueueFamilyVideoPropertiesKHR::videoCodecOperations
.
Video Std Header Name | Version |
---|---|
1.0.0 |
Video Capability | Requirement | Requirement Type |
---|---|---|
- | min | |
4096 | max | |
4096 | max | |
(64,64) | max | |
- | max | |
- | min | |
0 | min | |
0 | min | |
min | ||
min | ||
(0,0) except for profiles using | implementation-dependent |
H.265 Decode Operations
Video decode operations using an H.265 decode profile can be used to decode elementary video stream sequences compliant to the ITU-T H.265 Specification.
Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.
This process is performed according to the video decode operation steps with the codec-specific semantics defined in section 8 of ITU-T H.265 Specification:
- Syntax elements, derived values, and other parameters are applied from
the following structures:
- The
StdVideoH265VideoParameterSet
structure corresponding to the active VPS specifying the H.265 video parameter set. - The
StdVideoH265SequenceParameterSet
structure corresponding to the active SPS specifying the H.265 sequence parameter set. - The
StdVideoH265PictureParameterSet
structure corresponding to the active PPS specifying the H.265 picture parameter set. - The
StdVideoDecodeH265PictureInfo
structure specifying the H.265 picture information. - The
StdVideoDecodeH265ReferenceInfo
structures specifying the H.265 reference information corresponding to the optional reconstructed picture and any active reference pictures.
- The
- The contents of the provided video bitstream buffer range are interpreted as defined in the H.265 Decode Bitstream Data Access section.
- Picture data in the video picture resources corresponding to the used active reference pictures, decode output picture, and optional reconstructed picture is accessed as defined in the H.265 Decode Picture Data Access section.
- The decision on reference picture setup is made according to the parameters specified in the H.265 picture information.
If the parameters and the bitstream adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.265 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video decode operation will complete successfully. Otherwise, the video decode operation may complete unsuccessfully.
H.265 Decode Bitstream Data Access
The video bitstream buffer range should contain a VCL NAL unit comprised of the slice segment headers and data of a picture representing a frame, as defined in sections 7.3.6 and 7.3.8, and this data is interpreted as defined in sections 7.4.7 and 7.4.9 of the ITU-T H.265 Specification, respectively.
The offsets provided in
VkVideoDecodeH265PictureInfoKHR::pSliceSegmentOffsets
should
specify the starting offsets corresponding to each slice segment header
within the video bitstream buffer range.
H.265 Decode Picture Data Access
Accesses to image data within a video picture resource happen at the
granularity indicated by
VkVideoCapabilitiesKHR::pictureAccessGranularity
, as returned by
vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile.
Accordingly, the complete image subregion of a
decode output picture,
reference picture, or
reconstructed picture accessed by video coding
operations using an H.265 decode profile is defined
as the set of texels within the coordinate range:
- ([0,
endX
), [0,endY
))
Where:
endX
equalscodedExtent.width
rounded up to the nearest integer multiple ofpictureAccessGranularity.width
and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;- endY equals
codedExtent.height
rounded up to the nearest integer multiple ofpictureAccessGranularity.height
and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
Where codedExtent
is the member of the
VkVideoPictureResourceInfoKHR structure corresponding to the picture.
In case of video decode operations using an H.265
decode profile, any access to a picture at the coordinates
(x
,y
), as defined by the ITU-T H.265
Specification, is an access to the image subresource
referred to by the corresponding
VkVideoPictureResourceInfoKHR structure at the texel coordinates
(x
,y
).
H.265 Decode Profile
H.265 Decode Capabilities
H.265 Decode Parameter Sets
Video session parameters objects created with
the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
can contain the following types of parameters:
H.265 Video Parameter Sets (VPS)
Represented by StdVideoH265VideoParameterSet
structures and interpreted
as follows:
reserved1
,reserved2
, andreserved3
are used only for padding purposes and are otherwise ignored;vps_video_parameter_set_id
is used as the key of the VPS entry;- the
max_latency_increase_plus1
,max_dec_pic_buffering_minus1
, andmax_num_reorder_pics
members of theStdVideoH265DecPicBufMgr
structure pointed to bypDecPicBufMgr
correspond tovps_max_latency_increase_plus1
,vps_max_dec_pic_buffering_minus1
, andvps_max_num_reorder_pics
, respectively, as defined in section 7.4.3.1 of the ITU-T H.265 Specification; - the
StdVideoH265HrdParameters
structure pointed to bypHrdParameters
is interpreted as follows:reserved
is used only for padding purposes and is otherwise ignored;flags.fixed_pic_rate_general_flag
is a bitmask where bit index i corresponds tofixed_pic_rate_general_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification;flags.fixed_pic_rate_within_cvs_flag
is a bitmask where bit index i corresponds tofixed_pic_rate_within_cvs_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification;flags.low_delay_hrd_flag
is a bitmask where bit index i corresponds tolow_delay_hrd_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification;- if
flags.nal_hrd_parameters_present_flag
is set, thenpSubLayerHrdParametersNal
is a pointer to an array ofvps_max_sub_layers_minus1
+ 1 number ofStdVideoH265SubLayerHrdParameters
structures wherevps_max_sub_layers_minus1
is the corresponding member of the encompassingStdVideoH265VideoParameterSet
structure and each element is interpreted as follows:cbr_flag
is a bitmask where bit index i corresponds tocbr_flag[i]
as defined in section E.3.3 of the ITU-T H.265 Specification;- all other members of the
StdVideoH265SubLayerHrdParameters
structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
- if
flags.vcl_hrd_parameters_present_flag
is set, thenpSubLayerHrdParametersVcl
is a pointer to an array ofvps_max_sub_layers_minus1
+ 1 number ofStdVideoH265SubLayerHrdParameters
structures wherevps_max_sub_layers_minus1
is the corresponding member of the encompassingStdVideoH265VideoParameterSet
structure and each element is interpreted as follows:cbr_flag
is a bitmask where bit index i corresponds tocbr_flag[i]
as defined in section E.3.3 of the ITU-T H.265 Specification;- all other members of the
StdVideoH265SubLayerHrdParameters
structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
- all other members of
StdVideoH265HrdParameters
are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
- the
StdVideoH265ProfileTierLevel
structure pointed to bypProfileTierLevel
are interpreted as follows:general_level_idc
is one of the enum constantsSTD_VIDEO_H265_LEVEL_IDC_<major>_<minor>
identifying the H.265 level<major>.<minor>
as defined in section A.4 of the ITU-T H.265 Specification;- all other members of
StdVideoH265ProfileTierLevel
are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
- all other members of
StdVideoH265VideoParameterSet
are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.
H.265 Sequence Parameter Sets (SPS)
Represented by StdVideoH265SequenceParameterSet
structures and
interpreted as follows:
reserved1
andreserved2
are used only for padding purposes and are otherwise ignored;- the pair constructed from
sps_video_parameter_set_id
andsps_seq_parameter_set_id
is used as the key of the SPS entry; - the
StdVideoH265ProfileTierLevel
structure pointed to bypProfileTierLevel
are interpreted as follows:general_level_idc
is one of the enum constantsSTD_VIDEO_H265_LEVEL_IDC_<major>_<minor>
identifying the H.265 level<major>.<minor>
as defined in section A.4 of the ITU-T H.265 Specification;- all other members of
StdVideoH265ProfileTierLevel
are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
- the
max_latency_increase_plus1
,max_dec_pic_buffering_minus1
, andmax_num_reorder_pics
members of theStdVideoH265DecPicBufMgr
structure pointed to bypDecPicBufMgr
correspond tosps_max_latency_increase_plus1
,sps_max_dec_pic_buffering_minus1
, andsps_max_num_reorder_pics
, respectively, as defined in section 7.4.3.2 of the ITU-T H.265 Specification; - if
flags.sps_scaling_list_data_present_flag
is set, then theStdVideoH265ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:ScalingList4x4
,ScalingList8x8
,ScalingList16x16
, andScalingList32x32
correspond toScalingList[0]
,ScalingList[1]
,ScalingList[2]
, andScalingList[3]
, respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;ScalingListDCCoef16x16
andScalingListDCCoef32x32
correspond toscaling_list_dc_coef_minus8[0]
andscaling_list_dc_coef_minus8[1]
, respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
pShortTermRefPicSet
is a pointer to an array ofnum_short_term_ref_pic_sets
number ofStdVideoH265ShortTermRefPicSet
structures where each element is interpreted as follows:reserved1
,reserved2
, andreserved3
are used only for padding purposes and are otherwise ignored;used_by_curr_pic_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_flag[i]
as defined in section 7.4.8 of the ITU-T H.265 Specification;use_delta_flag
is a bitmask where bit index i corresponds touse_delta_flag[i]
as defined in section 7.4.8 of the ITU-T H.265 Specification;used_by_curr_pic_s0_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_s0_flag[i]
as defined in section 7.4.8 of the ITU-T H.265 Specification;used_by_curr_pic_s1_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_s1_flag[i]
as defined in section 7.4.8 of the ITU-T H.265 Specification;- all other members of
StdVideoH265ShortTermRefPicSet
are interpreted as defined in section 7.4.8 of the ITU-T H.265 Specification;
- if
flags.long_term_ref_pics_present_flag
is set then theStdVideoH265LongTermRefPicsSps
structure pointed to bypLongTermRefPicsSps
is interpreted as follows:used_by_curr_pic_lt_sps_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_lt_sps_flag[i]
as defined in section 7.4.3.2 of the ITU-T H.265 Specification;- all other members of
StdVideoH265LongTermRefPicsSps
are interpreted as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
- if
flags.vui_parameters_present_flag
is set, then theStdVideoH265SequenceParameterSetVui
structure pointed to bypSequenceParameterSetVui
is interpreted as follows:reserved1
,reserved2
, andreserved3
are used only for padding purposes and are otherwise ignored;- the
StdVideoH265HrdParameters
structure pointed to bypHrdParameters
is interpreted as follows:flags.fixed_pic_rate_general_flag
is a bitmask where bit index i corresponds tofixed_pic_rate_general_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification;flags.fixed_pic_rate_within_cvs_flag
is a bitmask where bit index i corresponds tofixed_pic_rate_within_cvs_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification;flags.low_delay_hrd_flag
is a bitmask where bit index i corresponds tolow_delay_hrd_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification;- if
flags.nal_hrd_parameters_present_flag
is set, thenpSubLayerHrdParametersNal
is a pointer to an array ofsps_max_sub_layers_minus1
+ 1 number ofStdVideoH265SubLayerHrdParameters
structures wheresps_max_sub_layers_minus1
is the corresponding member of the encompassingStdVideoH265SequenceParameterSet
structure and each element is interpreted as follows:cbr_flag
is a bitmask where bit index i corresponds tocbr_flag[i]
as defined in section E.3.3 of the ITU-T H.265 Specification;- all other members of the
StdVideoH265SubLayerHrdParameters
structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
- if
flags.vcl_hrd_parameters_present_flag
is set, thenpSubLayerHrdParametersVcl
is a pointer to an array ofsps_max_sub_layers_minus1
+ 1 number ofStdVideoH265SubLayerHrdParameters
structures wheresps_max_sub_layers_minus1
is the corresponding member of the encompassingStdVideoH265SequenceParameterSet
structure and each element is interpreted as follows:cbr_flag
is a bitmask where bit index i corresponds tocbr_flag[i]
as defined in section E.3.3 of the ITU-T H.265 Specification;- all other members of the
StdVideoH265SubLayerHrdParameters
structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
- all other members of
StdVideoH265HrdParameters
are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
- all other members of
pSequenceParameterSetVui
are interpreted as defined in section E.3.1 of the ITU-T H.265 Specification;
- if
flags.sps_palette_predictor_initializer_present_flag
is set, then thePredictorPaletteEntries
member of theStdVideoH265PredictorPaletteEntries
structure pointed to bypPredictorPaletteEntries
is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification; - all other members of
StdVideoH265SequenceParameterSet
are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.
H.265 Picture Parameter Sets (PPS)
Represented by StdVideoH265PictureParameterSet
structures and
interpreted as follows:
reserved1
,reserved2
, andreserved3
are used only for padding purposes and are otherwise ignored;- the triplet constructed from
sps_video_parameter_set_id
,pps_seq_parameter_set_id
, andpps_pic_parameter_set_id
is used as the key of the PPS entry; - if
flags.pps_scaling_list_data_present_flag
is set, then theStdVideoH265ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:ScalingList4x4
,ScalingList8x8
,ScalingList16x16
, andScalingList32x32
correspond toScalingList[0]
,ScalingList[1]
,ScalingList[2]
, andScalingList[3]
, respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;ScalingListDCCoef16x16
andScalingListDCCoef32x32
correspond toscaling_list_dc_coef_minus8[0]
andscaling_list_dc_coef_minus8[1]
, respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
- if
flags.pps_palette_predictor_initializer_present_flag
is set, then thePredictorPaletteEntries
member of theStdVideoH265PredictorPaletteEntries
structure pointed to bypPredictorPaletteEntries
is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification; - all other members of
StdVideoH265PictureParameterSet
are interpreted as defined in section 7.4.3.3 of the ITU-T H.265 Specification.
H.265 Decoding Parameters
H.265 Decode Requirements
This section describes the required H.265 decoding capabilities for
physical devices that have at least one queue family that supports the video
codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
, as
returned by vkGetPhysicalDeviceQueueFamilyProperties2 in
VkQueueFamilyVideoPropertiesKHR::videoCodecOperations
.
Video Std Header Name | Version |
---|---|
1.0.0 |
Video Capability | Requirement | Requirement Type |
---|---|---|
- | min | |
4096 | max | |
4096 | max | |
(64,64) | max | |
- | max | |
- | min | |
0 | min | |
0 | min | |
min | ||
min |
AV1 Decode Operations
Video decode operations using an AV1 decode profilecan be used to decode elementary video stream sequences compliant with the AV1 Specification.
Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.
This process is performed according to the video decode operation steps with the codec-specific semantics defined in section 7 of the AV1 Specification:
- Syntax elements, derived values, and other parameters are applied from
the following structures:
- The
StdVideoAV1SequenceHeader
structure stored in the bound video session parameters object specifying the active sequence header. - The
StdVideoDecodeAV1PictureInfo
structure specifying the AV1 picture information. - The
StdVideoDecodeAV1ReferenceInfo
structures specifying the AV1 reference information corresponding to the optional reconstructed picture and any active reference pictures.
- The
- The contents of the provided video bitstream buffer range are interpreted as defined in the AV1 Decode Bitstream Data Access section.
- Picture data in the video picture resources corresponding to the used active reference pictures, decode output picture, and optional reconstructed picture is accessed as defined in the AV1 Decode Picture Data Access section.
- The decision on reference picture setup is made according to the parameters specified in the AV1 picture information.
If the parameters and the bitstream adhere to the syntactic and semantic requirements defined in the corresponding sections of the AV1 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video decode operation will complete successfully. Otherwise, the video decode operation may complete unsuccessfully.
AV1 Decode Bitstream Data Access
The video bitstream buffer range should contain one or more frame OBUs, comprised of a frame header OBU and tile group OBU, that together represent an entire frame, as defined in sections 5.10, 5.9, and 5.11, and this data is interpreted as defined in sections 6.9, 6.8, and 6.10 of the AV1 Specification, respectively.
The offset specified in
VkVideoDecodeAV1PictureInfoKHR::frameHeaderOffset
should
specify the starting offset of the frame header OBU of the frame.
When the tiles of the frame are encoded into multiple tile groups, each
frame OBU has a separate frame header OBU but their content is expected to
match per the requirements of the AV1 Specification.
Accordingly, the offset specified in frameHeaderOffset
can be the
offset of any of the otherwise identical frame header OBUs when multiple
tile groups are present.
The offsets and sizes provided in
VkVideoDecodeAV1PictureInfoKHR::pTileOffsets
and
VkVideoDecodeAV1PictureInfoKHR::pTileSizes
, respectively,
should specify the starting offsets and sizes corresponding to each tile
within the video bitstream buffer range.
AV1 Decode Picture Data Access
Accesses to image data within a video picture resource happen at the
granularity indicated by
VkVideoCapabilitiesKHR::pictureAccessGranularity
, as returned by
vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile.
Accordingly, the complete image subregion of a
decode output picture,
reference picture, or
reconstructed picture accessed by video coding
operations using an AV1 decode profile is defined as
the set of texels within the coordinate range:
- ([0,
endX
), [0,endY
))
Where:
endX
equalscodedExtent.width
rounded up to the nearest integer multiple ofpictureAccessGranularity.width
and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;- endY equals
codedExtent.height
rounded up to the nearest integer multiple ofpictureAccessGranularity.height
and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
Where codedExtent
is the member of the
VkVideoPictureResourceInfoKHR structure corresponding to the picture.
In case of video decode operations using an AV1 decode
profile, any access to a picture at the coordinates
(x
,y
), as defined by the AV1
Specification, is an access to the image subresource
referred to by the corresponding
VkVideoPictureResourceInfoKHR structure at the texel coordinates
(x
,y
).
AV1 Reference Names and Semantics
Individual reference frames used in the decoding process have different
semantics, as defined in section 6.10.24 of the AV1
Specification.
The AV1 semantics associated with a reference picture are indicated by the
corresponding enumeration constant defined in the Video Std enumeration type
StdVideoAV1ReferenceName
:
STD_VIDEO_AV1_REFERENCE_NAME_INTRA_FRAME
identifies the reference used for intra coding (INTRA_FRAME
), as defined in sections 2 and 7.11.2 of the AV1 Specification.- All other enumeration constants refer to backward or forward references
used for inter coding, as defined in sections 2 and 7.11.3 of the
AV1 Specification:
STD_VIDEO_AV1_REFERENCE_NAME_LAST_FRAME
identifies theLAST_FRAME
referenceSTD_VIDEO_AV1_REFERENCE_NAME_LAST2_FRAME
identifies theLAST2_FRAME
referenceSTD_VIDEO_AV1_REFERENCE_NAME_LAST3_FRAME
identifies theLAST3_FRAME
referenceSTD_VIDEO_AV1_REFERENCE_NAME_GOLDEN_FRAME
identifies theGOLDEN_FRAME
referenceSTD_VIDEO_AV1_REFERENCE_NAME_BWDREF_FRAME
identifies theBWDREF_FRAME
referenceSTD_VIDEO_AV1_REFERENCE_NAME_ALTREF2_FRAME
identifies theALTREF2_FRAME
referenceSTD_VIDEO_AV1_REFERENCE_NAME_ALTREF_FRAME
identifies theALTREF_FRAME
reference
These enumeration constants are not directly used in any APIs but are used to indirectly index into certain Video Std and Vulkan API parameter arrays.
AV1 Decode Profile
AV1 Decode Capabilities
AV1 Decode Parameter Sets
Video session parameters objects created with
the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR
contain a single instance of the following parameter set:
AV1 Sequence Header
Represented by StdVideoAV1SequenceHeader
structures and interpreted as
follows:
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored;- the
StdVideoAV1ColorConfig
structure pointed to bypColorConfig
is interpreted as follows:flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored;- all other members of
StdVideoAV1ColorConfig
are interpreted as defined in section 6.4.2 of the AV1 Specification;
- if
flags.timing_info_present_flag
is set, then theStdVideoAV1TimingInfo
structure pointed to bypTimingInfo
is interpreted as follows:flags.reserved
is used only for padding purposes and is otherwise ignored;- all other members of
StdVideoAV1TimingInfo
are interpreted as defined in section 6.4.3 of the AV1 Specification;
- all other members of
StdVideoAV1SequenceHeader
are interpreted as defined in section 6.4 of the AV1 Specification.
AV1 Decoding Parameters
AV1 Decode Requirements
This section describes the required AV1 decoding capabilities for physical
devices that have at least one queue family that supports the video codec
operation VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR
, as returned by
vkGetPhysicalDeviceQueueFamilyProperties2 in
VkQueueFamilyVideoPropertiesKHR::videoCodecOperations
.
Video Std Header Name | Version |
---|---|
1.0.0 |
Video Capability | Requirement | Requirement Type |
---|---|---|
- | min | |
4096 | max | |
4096 | max | |
(64,64) | max | |
- | max | |
- | min | |
0 | min | |
0 | min | |
min | ||
min |
Video Encode Operations
Video encode operations consume an encode input picture and zero or more reference pictures, and produce compressed video data to a video bitstream buffer and an optional reconstructed picture.
Such encode input pictures can be used as the output of video decode operations, with graphics or compute operations, or with Window System Integration APIs, depending on the capabilities of the implementation.
Video encode operations may access the following resources in the
VK_PIPELINE_STAGE_2_VIDEO_ENCODE_BIT_KHR
stage:
- The image subregions corresponding to the source
encode input picture and
active reference pictures with
access
VK_ACCESS_2_VIDEO_ENCODE_READ_BIT_KHR
. - The destination video bitstream buffer range and the optional
reconstructed picture with access
VK_ACCESS_2_VIDEO_ENCODE_WRITE_BIT_KHR
.
The image subresource of each video picture resource accessed by the video encode operation is specified using a corresponding VkVideoPictureResourceInfoKHR structure. Each such image subresource must be in the appropriate image layout as follows:
- If the image subresource is used in the video encode operation as an
encode input picture, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_ENCODE_SRC_KHR
layout. - If the image subresource is used in the video encode operation as a
reconstructed picture or reference picture, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_ENCODE_DPB_KHR
layout.
A video encode operation may complete unsuccessfully. In this case the target video bitstream buffer will have undefined: contents. Similarly, if reference picture setup is requested, the reconstructed-picture will also have undefined: contents, and the activated DPB slot will have an invalid picture reference.
If a video encode operation completes successfully and the codec-specific parameters provided by the application adhere to the syntactic and semantic requirements defined in the corresponding video compression standard, then the target video bitstream buffer will contain compressed video data after the execution of the video encode operation according to the respective codec-specific semantics.
Codec-Specific Semantics
The following aspects of video encode operations are codec-specific:
- The compressed video data written to the target video bitstream buffer range.
- The construction and interpretation of the list of active reference pictures and the interpretation of the picture data referred to by the corresponding image subregions.
- The construction and interpretation of information related to the encode input picture and the interpretation of the picture data referred to by the corresponding image subregion.
- The decision on reference picture setup.
- The construction and interpretation of information related to the optional reconstructed picture and the generation of picture data to the corresponding image subregion.
- Certain aspects of rate control.
These codec-specific behaviors are defined for each video codec operation separately.
- If the used video codec operation is
VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
, then the codec-specific aspects of the video encoding process are performed as defined in the H.264 Encode Operations section. - If the used video codec operation is
VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, then the codec-specific aspects of the video encoding process are performed as defined in the H.265 Encode Operations section.
Video Encode Parameter Overrides
Implementations supporting video encode operations for any particular video codec operation often support only a subset of the available encoding tools defined by the corresponding video compression standards. Accordingly, certain implementation-dependent limitations may apply to codec-specific parameters provided through the structures defined in the Video Std headers corresponding to the used video codec operation.
Exposing all of these restrictions on particular codec-specific parameter values or combinations thereof in the form of application-queryable capabilities is impractical, hence this specification allows implementations to override the value of any of the codec-specific parameters, unless otherwise specified, as long as all of the following conditions are met:
- If the application-provided codec-specific parameters adhere to the syntactic and semantic requirements and rules defined by the used video compression standard, and thus would be usable to produce a video bitstream compliant with that standard, then the codec-specific parameters resulting from the process of implementation overrides must also adhere to the same requirements and rules, and any video bitstream produced using the overridden parameters must also be compliant.
- The overridden codec-specific parameter values must not have an impact on the codec-independent behaviors defined for video encode operations.
- The implementation must not override any codec-specific parameters specified to a command that may cause application-provided codec-specific parameters specified to subsequent commands to no longer adhere to the semantic requirements and rules defined by the used video compression standard, unless the implementation also overrides those parameters to adhere to any such requirements and rules.
- The overridden codec-specific parameter values must not have an impact on the codec-specific picture data access semantics.
- The overridden codec-specific parameter values may change the contents of the codec-specific bitstream elements produced by video encode operations or otherwise retrieved by the application (e.g. using the vkGetEncodedVideoSessionParametersKHR command) but must still adhere to the codec-specific semantics defined for that video codec operation, including, but not limited to, the number, type, and order of the encoded codec-specific bitstream elements.
Besides codec-specific parameter overrides performed for implementation-dependent reasons, applications can enable the implementation to apply additional optimizing overrides that may improve the efficiency or performance of video encoding operations. However, implementations must meet the conditions listed above even in case of such optimizing overrides.
Unless the application opts in for optimizing overrides, implementations are not expected to override any of the codec-specific parameters, except when such overrides are necessary for the correct operation of video encoder implementation due to limitations to the available encoding tools on that implementation.
Video Encode Operation Steps
Each video encode operation performs the following steps in the
VK_PIPELINE_STAGE_2_VIDEO_ENCODE_BIT_KHR
stage:
- Reads the input picture data from the encode input picture;
- Determine derived encoding quality parameters according to the codec-specific semantics and the current rate control state;
- Compresses the input picture data according to the codec-specific semantics, applying any prediction data read from the active reference pictures and rate control restrictions in the process;
- Writes the encoded bitstream data to the destination video bitstream buffer range;
- Performs picture reconstruction of the encoded video data according to the codec-specific semantics, applying any prediction data read from the active reference pictures in the process, if a reconstructed picture is specified and reference picture setup is requested;
- If reference picture setup is requested, the DPB slot index specified in the reconstructed picture information is activated with the reconstructed picture;
- Writes the reconstructed picture data to the reconstructed picture, if one is specified, according to the codec-specific semantics.
When reconstructed picture information is provided, the specified DPB slot index is associated with the corresponding bound reference picture resource, indifferent of whether reference picture setup is requested.
Capabilities
Video Encode Quality Levels
Implementations can support more than one video encode quality levels for a video encode profile, which control the number and type of implementation-specific encoding tools and algorithms utilized in the encoding process.
Generally, using higher video encode quality levels may produce higher quality video streams at the cost of additional processing time. However, as the final quality of an encoded picture depends on the contents of the encode input picture, the contents of the active reference pictures, the codec-specific encode parameters, and the particular implementation-specific tools used corresponding to the individual video encode quality levels, there are no guarantees that using a higher video encode quality level will always produce a higher quality encoded picture for any given set of inputs.
Retrieving Encoded Session Parameters
Any codec-specific parameters stored in video session parameters objects may need to be separately encoded and included in the final video bitstream data, depending on the used video compression standard. In such cases the application must call the vkGetEncodedVideoSessionParametersKHR command to retrieve the encoded parameter data from the used video session parameters object in order to be able to produce a compliant video bitstream.
This is needed because implementations may have changed some of the codec-specific parameters stored in the video session parameters object, as defined in the Video Encode Parameter Overrides section. In addition, the vkGetEncodedVideoSessionParametersKHR command enables the application to retrieve the encoded parameter data without having to encode these codec-specific parameters manually.
Video Encode Commands
Video Encode Rate Control
The size of the encoded bitstream data produced by video encode operations is a function of the following set of constraints:
- The capabilities of the compression algorithms defined and employed by the used video compression standard;
- Restrictions imposed by the selected video profile according to the rules defined by the used video compression standard;
- Further restrictions imposed by the capabilities supported by the implementation for the selected video profile;
- The image data in the encode input picture and the set of active reference pictures (as these affect the effectiveness of the compression algorithms employed by the video encode operations);
- The set of codec-specific and codec-independent encoding parameters provided by the application.
These also inherently define the set of decoder capabilities required for reconstructing and processing the picture data in the encoded bitstream.
Video coding uses bitrate as the quantitative metric associated with encoded bitstream data size which expresses the rate at which video bitstream data can be transferred or processed, measured in number of bits per second. This bitrate is both a function of the encoded bitstream data size of the encoded pictures as well as the frame rate used by the video sequence.
Rate control algorithms are used by video encode operations to enable adjusting encoding parameters to achieve a target bitrate, or otherwise directly or indirectly control the bitrate of the generated video bitstream data. These algorithms are usually not defined by the used video compression standard, although some video compression standards do provide non-normative guidelines for implementations.
Accordingly, this specification does not mandate implementations to produce identical encoded bitstream data outputs in response to video encode operations, however, it does define a set of codec-independent and codec-specific parameters that enable the application to control the behavior of the rate control algorithms supported by the implementation. Some of these parameters guarantee certain implementation behavior while others provide guidance for implementations to apply various rate control heuristics.
Applications need to make sure that they configure rate control parameters appropriately and that they follow the promises made to the implementation through parameters providing guidance for the implementation’s rate control algorithms and heuristics in order to be able to get the desired rate control behavior and to be able to hit the set bitrate targets. In addition, the behavior of rate control may also differ across implementations even if the capabilities of the used video profile match between those implementations. This may happen due to implementations applying different rate control algorithms or heuristics internally, and thus even the same set of guidance parameter values may have different effects on the rate control behavior across implementations.
Rate Control Modes
After a video session is reset to the initial state, the default behavior and parameters of video encode rate control are entirely implementation-dependent and the application cannot affect the bitrate or quality parameters of the encoded bitstream data produced by video encode operations unless the application changes the rate control configuration of the video session, as described in the Video Coding Control section.
For each supported video profile, the implementation may expose a set of rate control modes that are available for use by the application when encoding bitstreams targeting that video profile. These modes allow using different rate control algorithms that fall into one of the following two categories:
- Per-operation rate control
- Stream-level rate control
In case of per-operation rate control, the bitrate of the generated video bitstream data is indirectly controlled by quality, size, or other encoding parameters specified by the application for each individual video encode operation.
In case of stream-level rate control, the application can directly specify target bitrates besides other encoding parameters to control the behavior of the rate control algorithm used by the implementation across multiple video encode operations.
Leaky Bucket Model
Video encoding implementations use the leaky bucket model for stream-level rate control. The leaky bucket is a concept referring to the interface between the video encoder and the consumer (for example, a network connection), where the video encoder produces encoded bitstream data corresponding to the encoded pictures and adds them in the leaky bucket while its content are drained by the consumer.
Analogously, a similar leaky bucket is considered to exist at the input interface of a video decoder, into which encoded bitstream data is continuously added and is subsequently consumed by the video decoder. It is desirable to avoid overflowing or underflowing this leaky bucked because:
- In case of an underflow, the video decoder will be unable to consume encoded bitstream data in order to decode pictures (and optionally display them).
- In case of an overflow, the leaky bucket will be unable to accommodate more encoded bitstream data and such data may need to be thrown away, leading to the loss of the corresponding encoded pictures.
These requirements can be satisfied by imposing various constraints on the encoder-side leaky bucket to avoid its overflow or underflow, depending on the used rate control algorithm and codec parameters. However, enumerating these constraints is outside the scope of this specification.
The term virtual buffer is often used as an alternative to refer to the leaky bucket.
This virtual buffer model is defined by the following parameters:
- The bitrate (
R
) at which the encoded bitstream is expected to be processed. - The size (
B
) of the virtual buffer. - The initial occupancy (
F
) of the virtual buffer.
In this model the virtual buffer is used to smooth out fluctuations in the bitrate of the encoded bitstream over time without experiencing buffer overflow or underflow, as long as the bitrate of the encoded stream does not diverge from the target bitrate for extended periods of time.
This buffering may inherently impose a processing delay, as the goal of the model is to enable decoders maintain a consistent processing rate of an encoded bitstream with varying data rate.
The initial or start-up delay (D
) is computed as:
D
=F
/R
Applications need to configure the virtual buffer with sufficient size to avoid or minimize buffer overflows and underflows while also keeping it small enough to meet their latency goals.
Rate Control Layers
Some video compression standards and video profiles allow associating encoded pictures with specific video coding layers. The name, identification, and semantics associated with such video coding layers are defined by the corresponding video compression standards.
Analogously, stream-level rate control can be configured to use one or more rate control layers:
- When a single rate control layer is configured, it is applied to all encoded pictures, regardless of the picture’s video coding layer. In this case the distribution of the available bitrate budget across video coding layers is implementation-dependent.
- When multiple rate control layers are configured, each rate control layer is applied to the corresponding video coding layer, i.e. only across encoded pictures pertaining to the corresponding video coding layer.
Individual rate control layers are identified using layer indices between
zero and N-1
, where N
is the number of active rate control layers.
Rate control layers are only applicable when using stream-level rate control modes.
Rate Control State
Rate control state is maintained by the implementation in the
video session objects and its parameters are specified
using an instance of the VkVideoEncodeRateControlInfoKHR
structure.
The complete rate control state of a video session is defined by the
following set of parameters:
- The values of the members of the VkVideoEncodeRateControlInfoKHR structure used to configure the rate control state.
- The values of the members of any
VkVideoEncodeRateControlLayerInfoKHR structures specified in
VkVideoEncodeRateControlInfoKHR::
pLayers
used to configure the state of individual rate control layers. - If the video session was created with an H.264
encode profile:
- The values of the members of the
VkVideoEncodeH264RateControlInfoKHR structure, if one is
specified in the
pNext
chain of the VkVideoEncodeRateControlInfoKHR used to configure the rate control state. - The values of the members of any
VkVideoEncodeH264RateControlLayerInfoKHR structures included in
the
pNext
chain of a VkVideoEncodeRateControlLayerInfoKHR structure used to configure the state of a rate control layer.
- The values of the members of the
VkVideoEncodeH264RateControlInfoKHR structure, if one is
specified in the
- If the video session was created with an H.265
encode profile:
- The values of the members of the
VkVideoEncodeH265RateControlInfoKHR structure, if one is
specified in the
pNext
chain of the VkVideoEncodeRateControlInfoKHR used to configure the rate control state. - The values of the members of any
VkVideoEncodeH265RateControlLayerInfoKHR structures included in
the
pNext
chain of a VkVideoEncodeRateControlLayerInfoKHR structure used to configure the state of a rate control layer.
- The values of the members of the
VkVideoEncodeH265RateControlInfoKHR structure, if one is
specified in the
Two rate control states match if all the parameters listed above match between them.
Rate Control Layer State
The configuration of individual rate control layers is specified using an
instance of the VkVideoEncodeRateControlLayerInfoKHR
structure.
H.264 Encode Operations
Video encode operations using an H.264 encode profile can be used to encode elementary video stream sequences compliant to the ITU-T H.264 Specification.
Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos.
This process is performed according to the video encode operation steps with the codec-specific semantics defined in section 8 of the ITU-T H.264 Specification as follows:
- Syntax elements, derived values, and other parameters are applied from
the following structures:
- The
StdVideoH264SequenceParameterSet
structure corresponding to the active SPS specifying the H.264 sequence parameter set. - The
StdVideoH264PictureParameterSet
structure corresponding to the active PPS specifying the H.264 picture parameter set. - The
StdVideoEncodeH264PictureInfo
structure specifying the H.264 picture information. - The
StdVideoEncodeH264SliceHeader
structures specifying the H.264 slice header parameters for each encoded H.264 slice. - The
StdVideoEncodeH264ReferenceInfo
structures specifying the H.264 reference information corresponding to the optional reconstructed picture and any active reference pictures.
- The
- The encoded bitstream data is written to the destination video bitstream buffer range as defined in the H.264 Encode Bitstream Data Access section.
- Picture data in the video picture resources corresponding to the used encode input picture, active reference pictures, and optional reconstructed picture is accessed as defined in the H.264 Encode Picture Data Access section.
- The decision on reference picture setup is made according to the parameters specified in the H.264 picture information.
If the parameters adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.264 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video encode operation will complete successfully. Otherwise, the video encode operation may complete unsuccessfully.
H.264 Encode Parameter Overrides
Implementations may override, unless otherwise specified, any of the H.264 encode parameters specified in the following Video Std structures:
StdVideoH264SequenceParameterSet
StdVideoH264PictureParameterSet
StdVideoEncodeH264PictureInfo
StdVideoEncodeH264SliceHeader
StdVideoEncodeH264ReferenceInfo
All such H.264 encode parameter overrides must fulfill the conditions defined in the Video Encode Parameter Overrides section.
In addition, implementations must not override any of the following H.264 encode parameters:
StdVideoEncodeH264PictureInfo
::primary_pic_type
StdVideoEncodeH264SliceHeader
::slice_type
In case of H.264 encode parameters stored in video session parameters objects, applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened. If the query indicates that implementation overrides were applied, then the application needs to retrieve and use the encoded H.264 parameter sets in the bitstream in order to be able to produce a compliant H.264 video bitstream using the H.264 encode parameters stored in the video session parameters object.
In case of any H.264 encode parameters stored in the encoded bitstream
produced by video encode operations, if the implementation supports the
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR
video encode feedback query flag, the application can use such queries to retrieve feedback about whether any implementation overrides have been applied to those H.264 encode parameters.
H.264 Encode Bitstream Data Access
Each video encode operation writes one or more VCL NAL units comprising of
slice headers and data of the encoded picture, in the format defined in
sections 7.3.3 and 7.3.4, according to the semantics defined in sections
7.4.3 and 7.4.4 of the ITU-T H.264 Specification,
respectively.
The number of VCL NAL units written is specified by
VkVideoEncodeH264PictureInfoKHR::naluSliceEntryCount
.
In addition, if
VkVideoEncodeH264PictureInfoKHR::generatePrefixNalu
is set to
VK_TRUE
for the video encode operation, then an additional prefix NAL
unit is written before each VCL NAL unit corresponding to individual slices
in the format defined in section 7.3.2.12, according to the semantics
defined in section 7.4.2.12 of the ITU-T H.264 Specification,
respectively.
H.264 Encode Picture Data Access
Accesses to image data within a video picture resource happen at the
granularity indicated by
VkVideoCapabilitiesKHR::pictureAccessGranularity
, as returned by
vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile.
Accordingly, the complete image subregion of a encode
input picture, reference picture, or
reconstructed picture accessed by video coding
operations using an H.264 encode profile is defined
as the set of texels within the coordinate range:
- ([0,
endX
), [0,endY
))
Where:
endX
equalscodedExtent.width
rounded up to the nearest integer multiple ofpictureAccessGranularity.width
and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;- endY equals
codedExtent.height
rounded up to the nearest integer multiple ofpictureAccessGranularity.height
and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
Where codedExtent
is the member of the
VkVideoPictureResourceInfoKHR structure corresponding to the picture.
In case of video encode operations using an H.264
encode profile, any access to a picture at the coordinates
(x
,y
), as defined by the ITU-T H.264
Specification, is an access to the image subresource
referred to by the corresponding
VkVideoPictureResourceInfoKHR structure at the texel coordinates
(x
,y
).
Implementations may choose not to access some or all texels within particular reference pictures available to a video encode operation (e.g. due to video encode parameter overrides restricting the effective set of used reference pictures, or if the encoding algorithm chooses not to use certain subregions of the reference picture data for sample prediction).
H.264 Frame, Picture, and Slice
H.264 pictures are partitioned into slices, as defined in section 6.3 of the ITU-T H.264 Specification.
For the purposes of this specification, the H.264 slices comprising a picture are referred to as the picture partitions of the picture.
Video encode operations using an H.264 encode
profile can encode slices of different types, as defined in section 7.4.3
of the ITU-T H.264 Specification, by specifying the
corresponding enumeration constant value in
StdVideoEncodeH264SliceHeader
::slice_type
in the
H.264 slice header parameters from the
Video Std enumeration type StdVideoH264SliceType
:
-
STD_VIDEO_H264_SLICE_TYPE_P
indicates that the slice is a P slice as defined in section 3.109 of the ITU-T H.264 Specification. -
STD_VIDEO_H264_SLICE_TYPE_B
indicates that the slice is a B slice as defined in section 3.9 of the ITU-T H.264 Specification. -
STD_VIDEO_H264_SLICE_TYPE_I
indicates that the slice is an I slice as defined in section 3.66 of the ITU-T H.264 Specification.
Pictures constructed from such slices can be of different types, as defined
in section 7.4.2.4 of the ITU-T H.264 Specification.
Video encode operations using an H.264 encode
profile can encode pictures of a specific type by specifying the
corresponding enumeration constant value in
StdVideoEncodeH264PictureInfo
::primary_pic_type
in the
H.264 picture information from the Video Std
enumeration type StdVideoH264PictureType
:
-
STD_VIDEO_H264_PICTURE_TYPE_P
indicates that the picture is a P picture. A frame consisting of a P picture is also referred to as a P frame. -
STD_VIDEO_H264_PICTURE_TYPE_B
indicates that the picture is a B picture. A frame consisting of a B picture is also referred to as a B frame. -
STD_VIDEO_H264_PICTURE_TYPE_I
indicates that the picture is an I picture. A frame consisting of an I picture is also referred to as an I frame. -
STD_VIDEO_H264_PICTURE_TYPE_IDR
indicates that the picture is a special type of I picture called an IDR picture as defined in section 3.69 of the ITU-T H.264 Specification. A frame consisting of an IDR picture is also referred to as an IDR frame.
H.264 Coding Blocks
H.264 encode supports a single type of coding block called a macroblock, as defined in section 3.84 of the ITU-T H.264 Specification.
H.264 Encode Profile
H.264 Encode Capabilities
H.264 Encode Quality Level Properties
H.264 Encode Session
Additional parameters can be specified when creating a video session with an
H.264 encode profile by including an instance of the
VkVideoEncodeH264SessionCreateInfoKHR structure in the pNext
chain of VkVideoSessionCreateInfoKHR.
H.264 Encode Parameter Sets
Video session parameters objects created with
the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
can contain the following types of parameters:
H.264 Sequence Parameter Sets (SPS)
Represented by StdVideoH264SequenceParameterSet
structures and
interpreted as follows:
reserved1
andreserved2
are used only for padding purposes and are otherwise ignored;seq_parameter_set_id
is used as the key of the SPS entry;level_idc
is one of the enum constantsSTD_VIDEO_H264_LEVEL_IDC_<major>_<minor>
identifying the H.264 level<major>.<minor>
as defined in section A.3 of the ITU-T H.264 Specification;- if
flags.seq_scaling_matrix_present_flag
is set, then theStdVideoH264ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:scaling_list_present_mask
is a bitmask where bit index i corresponds toseq_scaling_list_present_flag[i]
as defined in section 7.4.2.1 of the ITU-T H.264 Specification;use_default_scaling_matrix_mask
is a bitmask where bit index i corresponds toUseDefaultScalingMatrix4x4Flag[i]
, when i < 6, or corresponds toUseDefaultScalingMatrix8x8Flag[i-6]
, otherwise, as defined in section 7.3.2.1 of the ITU-T H.264 Specification;ScalingList4x4
andScalingList8x8
correspond to the identically named syntax elements defined in section 7.3.2.1 of the ITU-T H.264 Specification;
- if
flags.vui_parameters_present_flag
is set, thenpSequenceParameterSetVui
is a pointer to aStdVideoH264SequenceParameterSetVui
structure that is interpreted as follows:reserved1
is used only for padding purposes and is otherwise ignored;- if
flags.nal_hrd_parameters_present_flag
orflags.vcl_hrd_parameters_present_flag
is set, then theStdVideoH264HrdParameters
structure pointed to bypHrdParameters
is interpreted as follows:reserved1
is used only for padding purposes and is otherwise ignored;- all other members of
StdVideoH264HrdParameters
are interpreted as defined in section E.2.2 of the ITU-T H.264 Specification;
- all other members of
StdVideoH264SequenceParameterSetVui
are interpreted as defined in section E.2.1 of the ITU-T H.264 Specification;
- all other members of
StdVideoH264SequenceParameterSet
are interpreted as defined in section 7.4.2.1 of the ITU-T H.264 Specification.
H.264 Picture Parameter Sets (PPS)
Represented by StdVideoH264PictureParameterSet
structures and
interpreted as follows:
- the pair constructed from
seq_parameter_set_id
andpic_parameter_set_id
is used as the key of the PPS entry; - if
flags.pic_scaling_matrix_present_flag
is set, then theStdVideoH264ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:scaling_list_present_mask
is a bitmask where bit index i corresponds topic_scaling_list_present_flag[i]
as defined in section 7.4.2.2 of the ITU-T H.264 Specification;use_default_scaling_matrix_mask
is a bitmask where bit index i corresponds toUseDefaultScalingMatrix4x4Flag[i]
, when i < 6, or corresponds toUseDefaultScalingMatrix8x8Flag[i-6]
, otherwise, as defined in section 7.3.2.2 of the ITU-T H.264 Specification;ScalingList4x4
andScalingList8x8
correspond to the identically named syntax elements defined in section 7.3.2.2 of the ITU-T H.264 Specification;
- all other members of
StdVideoH264PictureParameterSet
are interpreted as defined in section 7.4.2.2 of the ITU-T H.264 Specification.
Implementations may override any of these parameters according to the semantics defined in the Video Encode Parameter Overrides section before storing the resulting H.264 parameter sets into the video session parameters object. Applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened and to retrieve the encoded H.264 parameter sets in order to be able to produce a compliant H.264 video bitstream.
Such H.264 parameter set overrides may also have cascading effects on the
implementation overrides applied to the encoded bitstream produced by video
encode operations.
If the implementation supports the
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR
video encode feedback query flag, then the application can use such queries to retrieve feedback about whether any implementation overrides have been applied to the encoded bitstream.