Queries
Queries provide a mechanism to return information about the processing of a sequence of Vulkan commands. Query operations are asynchronous, and as such, their results are not returned immediately. Instead, their results, and their availability status are stored in a Query Pool. The state of these queries can be read back on the host, or copied to a buffer object on the device.
The supported query types are Occlusion Queries, Pipeline Statistics Queries, Result Status Queries, Video Encode Feedback Queries and Timestamp Queries. Performance Queries are supported if the associated extension is available. Transform Feedback Queries are supported if the associated extension is available. Intel Performance Queries are supported if the associated extension is available. Mesh Shader Queries are supported if the associated extension is available.
Several additional queries with specific purposes associated with ray tracing are available if the corresponding extensions are supported, as described for VkQueryType.
Query Pools
Query Operation
The operation of queries is controlled by the commands vkCmdBeginQuery, vkCmdEndQuery, vkCmdBeginQueryIndexedEXT, vkCmdEndQueryIndexedEXT, vkCmdResetQueryPool, vkCmdCopyQueryPoolResults, vkCmdWriteTimestamp2, and vkCmdWriteTimestamp.
In order for a VkCommandBuffer
to record query management commands,
the queue family for which its VkCommandPool
was created must support
the appropriate type of operations (graphics, compute) suitable for the
query type of a given query pool.
Each query in a query pool has a status that is either unavailable or available, and also has state to store the numerical results of a query operation of the type requested when the query pool was created. Resetting a query via vkCmdResetQueryPool or vkResetQueryPool sets the status to unavailable and makes the numerical results undefined:. A query is made available by the operation of vkCmdEndQuery, vkCmdEndQueryIndexedEXT, vkCmdWriteTimestamp2, or vkCmdWriteTimestamp. Both the availability status and numerical results can be retrieved by calling either vkGetQueryPoolResults or vkCmdCopyQueryPoolResults.
After query pool creation, each query is in an uninitialized state and must be reset before it is used. Queries must also be reset between uses.
If a logical device includes multiple physical devices, then each command that writes a query must execute on a single physical device, and any call to vkCmdBeginQuery must execute the corresponding vkCmdEndQuery command on the same physical device.
Once queries are reset and ready for use, query commands can be issued to a command buffer. Occlusion queries and pipeline statistics queries count events - drawn samples and pipeline stage invocations, respectively - resulting from commands that are recorded between a vkCmdBeginQuery command and a vkCmdEndQuery command within a specified command buffer, effectively scoping a set of drawing and/or dispatching commands. Timestamp queries write timestamps to a query pool. Performance queries record performance counters to a query pool.
A query must begin and end in the same command buffer, although if it is a
primary command buffer, and the inheritedQueries
feature is enabled, it can execute secondary
command buffers during the query operation.
For a secondary command buffer to be executed while a query is active, it
must set the occlusionQueryEnable
, queryFlags
, and/or
pipelineStatistics
members of VkCommandBufferInheritanceInfo to
conservative values, as described in the Command
Buffer Recording section.
A query must either begin and end inside the same subpass of a render pass
instance, or must both begin and end outside of a render pass instance
(i.e. contain entire render pass instances).
If queries are used while executing a render pass instance that has
multiview enabled, the query uses N consecutive query indices in the
query pool (starting at query
) where N is the number of bits set
in the view mask in the subpass the query is used in.
How the numerical results of the query are distributed among the queries is
implementation-dependent.
For example, some implementations may write each view’s results to a
distinct query, while other implementations may write the total result to
the first query and write zero to the other queries.
However, the sum of the results in all the queries must accurately reflect
the total result of the query summed over all views.
Applications can sum the results from all the queries to compute the total
result.
Queries used with multiview rendering must not span subpasses, i.e. they must begin and end in the same subpass.
A query must either begin and end inside the same video coding scope, or must both begin and end outside of a video coding scope and must not contain entire video coding scopes.
An application can retrieve results either by requesting they be written
into application-provided memory, or by requesting they be copied into a
VkBuffer
.
In either case, the layout in memory is defined as follows:
- The first query’s result is written starting at the first byte requested
by the command, and each subsequent query’s result begins
stride
bytes later. - Occlusion queries, pipeline statistics queries, transform feedback queries, primitives generated queries, mesh shader queries, video encode feedback queries, and timestamp queries store results in a tightly packed array of unsigned integers, either 32- or 64-bits as requested by the command, storing the numerical results and, if requested, the availability status.
- Performance queries store results in a tightly packed array whose type
is determined by the
unit
member of the corresponding VkPerformanceCounterKHR. - If
VK_QUERY_RESULT_WITH_AVAILABILITY_BIT
is used, the final element of each query’s result is an integer indicating whether the query’s result is available, with any non-zero value indicating that it is available. - If
VK_QUERY_RESULT_WITH_STATUS_BIT_KHR
is used, the final element of each query’s result is an integer value indicating that status of the query result. Positive values indicate success, negative values indicate failure, and 0 indicates that the result is not yet available. Specific error codes are encoded in the VkQueryResultStatusKHR enumeration. - Occlusion queries write one integer value - the number of samples
passed.
Pipeline statistics queries write one integer value for each bit that is
enabled in the
pipelineStatistics
when the pool is created, and the statistics values are written in bit order starting from the least significant bit. Timestamp queries write one integer value. Performance queries write one VkPerformanceCounterResultKHR value for each VkPerformanceCounterKHR in the query. Transform feedback queries write two integers; the first integer is the number of primitives successfully written to the corresponding transform feedback buffer and the second is the number of primitives output to the vertex stream, regardless of whether they were successfully captured or not. In other words, if the transform feedback buffer was sized too small for the number of primitives output by the vertex stream, the first integer represents the number of primitives actually written and the second is the number that would have been written if all the transform feedback buffers associated with that vertex stream were large enough. Primitives generated queries write the number of primitives output to the vertex stream, regardless of whether transform feedback is active or not, or whether they were successfully captured by transform feedback or not. This is identical to the second integer of the transform feedback queries if transform feedback is active. Mesh shader queries write a single integer. Video encode feedback queries write one or more integer values for each bit that is enabled in VkQueryPoolVideoEncodeFeedbackCreateInfoKHR::encodeFeedbackFlags
when the pool is created, and the feedback values are written in bit order starting from the least significant bit, as described here. - If more than one query is retrieved and
stride
is not at least as large as the size of the array of values corresponding to a single query, the values written to memory are undefined:.
Rendering operations such as clears, MSAA resolves, attachment load/store operations, and blits may count towards the results of queries. This behavior is implementation-dependent and may vary depending on the path used within an implementation. For example, some implementations have several types of clears, some of which may include vertices and some not.
Occlusion Queries
Occlusion queries track the number of samples that pass the per-fragment
tests for a set of drawing commands.
As such, occlusion queries are only available on queue families supporting
graphics operations.
The application can then use these results to inform future rendering
decisions.
An occlusion query is begun and ended by calling vkCmdBeginQuery
and
vkCmdEndQuery
, respectively.
When an occlusion query begins, the count of passing samples always starts
at zero.
For each drawing command, the count is incremented as described in
Sample Counting.
If flags
does not contain VK_QUERY_CONTROL_PRECISE_BIT
an
implementation may generate any non-zero result value for the query if the
count of passing samples is non-zero.
Not setting VK_QUERY_CONTROL_PRECISE_BIT
mode may be more efficient
on some implementations, and should be used where it is sufficient to know
a boolean result on whether any samples passed the per-fragment tests.
In this case, some implementations may only return zero or one, indifferent
to the actual number of samples passing the per-fragment tests.
Setting VK_QUERY_CONTROL_PRECISE_BIT
does not guarantee that different
implementations return the same number of samples in an occlusion query.
Some implementations may kill fragments in the
pre-rasterization shader
stage, and these killed fragments do not contribute to the final result of
the query.
It is possible that some implementations generate a zero result value for
the query, while others generate a non-zero value.
When an occlusion query finishes, the result for that query is marked as
available.
The application can then either copy the result to a buffer (via
vkCmdCopyQueryPoolResults
) or request it be put into host memory (via
vkGetQueryPoolResults
).
If occluding geometry is not drawn first, samples can pass the depth test, but still not be visible in a final image.
Pipeline Statistics Queries
Pipeline statistics queries allow the application to sample a specified set
of VkPipeline
counters.
These counters are accumulated by Vulkan for a set of either drawing or
dispatching commands while a pipeline statistics query is active.
As such, pipeline statistics queries are available on queue families
supporting either graphics or compute operations.
The availability of pipeline statistics queries is indicated by the
pipelineStatisticsQuery
member of the VkPhysicalDeviceFeatures
object (see vkGetPhysicalDeviceFeatures
and vkCreateDevice
for
detecting and requesting this query type on a VkDevice
).
A pipeline statistics query is begun and ended by calling
vkCmdBeginQuery
and vkCmdEndQuery
, respectively.
When a pipeline statistics query begins, all statistics counters are set to
zero.
While the query is active, the pipeline type determines which set of
statistics are available, but these must be configured on the query pool
when it is created.
If a statistic counter is issued on a command buffer that does not support
the corresponding operation, or the counter corresponds to a shading stage
which is missing from any of the pipelines used while the query is active,
the value of that counter is undefined: after the query has been made
available.
At least one statistic counter relevant to the operations supported on the
recording command buffer must be enabled.
Timestamp Queries
Timestamps provide applications with a mechanism for timing the execution
of commands.
A timestamp is an integer value generated by the VkPhysicalDevice
.
Unlike other queries, timestamps do not operate over a range, and so do not
use vkCmdBeginQuery or vkCmdEndQuery.
The mechanism is built around a set of commands that allow the application
to tell the VkPhysicalDevice
to write timestamp values to a
query pool and then either read timestamp values on the
host (using vkGetQueryPoolResults) or copy timestamp values to a
VkBuffer
(using vkCmdCopyQueryPoolResults).
The application can then compute differences between timestamps to
determine execution time.
The number of valid bits in a timestamp value is determined by the
VkQueueFamilyProperties
::timestampValidBits
property of the
queue on which the timestamp is written.
Timestamps are supported on any queue which reports a non-zero value for
timestampValidBits
via vkGetPhysicalDeviceQueueFamilyProperties.
If the timestampComputeAndGraphics
limit is VK_TRUE
, timestamps are
supported by every queue family that supports either graphics or compute
operations (see VkQueueFamilyProperties).
The number of nanoseconds it takes for a timestamp value to be incremented
by 1 can be obtained from
VkPhysicalDeviceLimits
::timestampPeriod
after a call to
vkGetPhysicalDeviceProperties
.
Performance Queries
Performance queries provide applications with a mechanism for getting performance counter information about the execution of command buffers, render passes, and commands.
Each queue family advertises the performance counters that can be queried on a queue of that family via a call to vkEnumeratePhysicalDeviceQueueFamilyPerformanceQueryCountersKHR. Implementations may limit access to performance counters based on platform requirements or only to specialized drivers for development purposes.
This may include no performance counters being enumerated, or a reduced set. Please refer to platform-specific documentation for guidance on any such restrictions.
Performance queries use the existing vkCmdBeginQuery and vkCmdEndQuery to control what command buffers, render passes, or commands to get performance information for.
Implementations may require multiple passes where the command buffer, render passes, or commands being recorded are the same and are executed on the same queue to record performance counter data. This is achieved by submitting the same batch and providing a VkPerformanceQuerySubmitInfoKHR structure containing a counter pass index. The number of passes required for a given performance query pool can be queried via a call to vkGetPhysicalDeviceQueueFamilyPerformanceQueryPassesKHR.
Command buffers created with
VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT
must not be re-submitted.
Changing command buffer usage bits may affect performance.
To avoid this, the application should re-record any command buffers with
the VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT
when multiple counter
passes are required.
Performance counter results from a performance query pool can be obtained with the command vkGetQueryPoolResults.
Profiling Lock
Transform Feedback Queries
Transform feedback queries track the number of primitives attempted to be
written and actually written, by the vertex stream being captured, to a
transform feedback buffer.
This query is updated during drawing commands while transform feedback is
active.
The number of primitives actually written will be less than the number
attempted to be written if the bound transform feedback buffer size was too
small for the number of primitives actually drawn.
Primitives are not written beyond the bound range of the transform feedback
buffer.
A transform feedback query is begun and ended by calling
vkCmdBeginQuery
and vkCmdEndQuery
, respectively to query for
vertex stream zero.
vkCmdBeginQueryIndexedEXT
and vkCmdEndQueryIndexedEXT
can be
used to begin and end transform feedback queries for any supported vertex
stream.
When a transform feedback query begins, the count of primitives written and
primitives needed starts from zero.
For each drawing command, the count is incremented as vertex attribute
outputs are captured to the transform feedback buffers while transform
feedback is active.
When a transform feedback query finishes, the result for that query is
marked as available.
The application can then either copy the result to a buffer (via
vkCmdCopyQueryPoolResults
) or request it be put into host memory (via
vkGetQueryPoolResults
).
Primitives Generated Queries
When a generated primitive query for a vertex stream is active, the
primitives-generated count is incremented every time a primitive emitted to
that stream reaches the transform feedback stage, whether or not transform
feedback is active.
A primitives generated query is begun and ended by calling
vkCmdBeginQuery
and vkCmdEndQuery
, respectively to query for
vertex stream zero.
vkCmdBeginQueryIndexedEXT
and vkCmdEndQueryIndexedEXT
can be
used to begin and end primitives generated queries for any supported vertex
stream.
When a primitives generated query begins, the count of primitives generated
starts from zero.
When a primitives generated query finishes, the result for that query is
marked as available.
The application can then either copy the result to a buffer (via
vkCmdCopyQueryPoolResults
) or request it be put into host memory (via
vkGetQueryPoolResults
).
The result of this query is typically identical to
VK_QUERY_PIPELINE_STATISTIC_CLIPPING_INVOCATIONS_BIT
, but the
primitives generated query is deterministic, i.e. it must be identical to
the number of primitives processed.
VK_QUERY_PIPELINE_STATISTIC_CLIPPING_INVOCATIONS_BIT
may vary for
implementation-dependent reasons, e.g. the same primitive may be processed
multiple times for purposes of clipping.
Mesh Shader Queries
When a generated mesh primitives query is active, the mesh-primitives-generated count is incremented every time a primitive emitted from the mesh shader stage reaches the fragment shader stage. When a generated mesh primitives query begins, the mesh-primitives-generated count starts from zero.
Mesh and task shader pipeline statistics queries function the same way that invocation queries work for other shader stages, counting the number of times the respective shader stage has been run. When the statistics query begins, the invocation counters start from zero.
Intel Performance Queries
Intel performance queries allow an application to capture performance data for a set of commands. Performance queries are used in a similar way than other types of queries. A main difference with existing queries is that the resulting data should be handed over to a library capable to produce human readable results rather than being read directly by an application.
Result Status Queries
Result status queries serve a single purpose: allowing the application to
determine whether a set of operations have completed successfully or not, as
indicated by the VkQueryResultStatusKHR value written when retrieving
the result of a query using the VK_QUERY_RESULT_WITH_STATUS_BIT_KHR
flag.
Unlike other query types, result status queries do not track or maintain any other data beyond the completion status, thus no other data is written when retrieving their results.
Support for result status queries is indicated by
VkQueueFamilyQueryResultStatusPropertiesKHR::queryResultStatusSupport
, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 for the
queue family in question.
Video Encode Feedback Queries
Video encode feedback queries allow the application to capture feedback
values generated by video encode operations.
As such, video encode feedback queries are available on queue families
supporting video encode operations.
The availability of individual video encode feedback values is indicated by
the bits of
VkVideoEncodeCapabilitiesKHR::supportedEncodeFeedbackFlags
, as
returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the
video profile the queries are intended to be used with.
The set of enabled video encode feedback values must be configured on the
query pool when it is created using the encodeFeedbackFlags
member of
the VkQueryPoolVideoEncodeFeedbackCreateInfoKHR included in the
pNext
chain of VkQueryPoolCreateInfo.